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WuEN making use of blood tests in an attempt to exonerate a supposedly falsely 
accused man of a charge of paternity, it is helpful to have an estimate of the chances 
of making an exclusion with a given blood group system, especially the probabilities 
when the blood group of the accused man is known. A knowledge of such probabil- 
ities will influence the number of blood group systems employed, and enable the 
expense of any given set of tests to be weighed against the probable benefit. 

General formulas which enable such calculations to be made for populations pos- 
sessing any given set of gene frequencies have been derived for the ABO, MN, and 
MNS systems (Wiener, Lederer and Polayes 1930; Zarnik 1930; Wiener 1952; Boyd 
1955). They are of course particularly simple for a system consisting of a simple 
pair of genes, one dominant (Wiener, Lederer and Polayes 1930; Cotterman 1951; 
Race and Sanger 1950, 1954). Thus far no such formula has been derived for the 
more complicated Rh system, and the only information available is a table constructed 
in 1944 by R. A. Fisher, based on approximate gene frequencies which might apply 
to the English population, and printed, without explanation of how it was con- 
structed, by Race and Sanger (1950, 1954). These results would obviously not apply 
to populations with different Rh gene frequencies, and it might be wondered how 
exact the results, based on approximate gene frequencies, are for the English popu- 
lation. It is the purpose of the present communication to supply the required general 
formulas. 

The first requirement in such calculations is a table showing the frequency with 
which children of the various phenotypes are born to women of the various pheno- 
types, when such women are supposed to be mated to men drawn at random from 
the general population. Such frequencies are given in table 1, which was constructed 
by examining the children which would be born to women of each phenotype when 
they receive the various Rh genes in numbers proportional to the frequencies of 
these genes in the population. (Table 1 would of course apply equally to father-child 
combinations.) For the sake of brevity, the frequencies of the genes Re, Ro, R’’, r, 
R,, Ri, and R’ are represented by the letters t, u, v, w, x, y, z in that order (the 
order given by Fisher 1946). Using these symbols, the expected frequency of the 
various phenotypes are 
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‘The research reported in this paper was made possible in part by the use of equipment pur- 
chased by Boston University under Contract No. Nonr-492(01) with the Office of Naval Research. 
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1. cde = 

2. cdE = v? + 2vw 

3. cDe = uw? + 2uw 

4. cDE = t? + 2tv + 2tu + 2tw + 2uv 
5. Cde/c = 2wz 

6. CdE/c = 2vz 

7. CDe/c = 2(uy + wy + uz) 


8. CDE/c = 2(tx + ty + tz + vx + vy + ux + wx) 
9. Cde/C = 2 

10. CdE/C = usually absent 

11. CDe/C = y*® + 2yz 

12. CDE/C = x* + 2xy + 2xz 


In table 1 the letter k stands for t + u + v + w, and p stands for x + y + z. The 
rare gene R,, although probably present, is assumed to be absent, and in any case 
it is so rare as to contribute very little to the chances of exclusion. Therefore the 
very rare phenotype CdE/C (R,R’), which would be number 10 in a complete 
listing of the 12 phenotypes distinguishable with 4 sera (Boyd 1954b) is not included. 
Omitting the non-contributive (for our purposes) R, from consideration results in 
considerable simplification of the tables and calculations. 

From table 1, knowing the frequencies of the various genes in the population, a 
table showing the numerical values of the frequency with which children of the 
various types are born to various women can be constructed. This has been done in 
table 2, which shows the values for a population having the gene frequencies derived 
by Fisher (1946, 1947) form the data of Fisher and Race (1946). These frequencies 
are, to four decimal places, 


0.1280 
u 0.0305 
Vv 0.0170 
Ww 0.3790 
x 0.0013 
y 0.4361 
Zz 0.0081 


The dashes in table 2 indicate that no child of the type indicated at the left could 
be born to a mother of the type indicated above. In the case of row 1, column 6, 
and row 6, columns 1 and 3, this is the result of ignoring the gene R,. This gene is 
so rare that even if we were to take account of it nothing significant would be con- 
tributed to these entries in the table. 

Following the generally accepted practice of carrying in computation more deci- 
mals than it is desired to retain at the end (Boyd 1954a), all the calculations in this 
paper have been carried out using six places of decimals, finally rounding off to four. 
This explains occasional differences in the fourth place between values given in ta- 
bles 2 and 4 and values which would result from only four-figure accuracy. 

It is next necessary to know which combinations of mother and child will exclude 
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TABLE 2.—NUMERICAL FREQUENCIES OF VARIOUS MOTHER-CHILD COMBINATIONS 


Phenotype of Mother 
Phenotype of 


children 1 2 3 4 5 6 7 & 9 11 12 
ede cdE cDe cDE | Cde/e | CdE/c | CDe/c | CDE/c) Cde CDE/C 

1. cde 0544. .0024 .0044} .0184 .0012. — .0626 0002; | - 
2. cdE .0024 .0028 .0002| .0019) .0001) .0001) .0028, .0029 — — — 
3. cDe .0044 0002 .0055} .0033) 00011 — .0106 .0000 — — — 
4. cDE .0184 .0019 .0033} .0466) .0004' .0000 .0231, .0329 — — — 
5. Cde/c .0012, .0001) .0001) .0004 .0012) .0001) .0014) .0004 .0000, .0013 0000 
6. CdE/c — | .0001 — | .0000 .0001  .0000 .0000, .0001 .0000 .0001 .0000 
7. CDe/c .0626, .0028 .0106} .0231 .0014, .0000 .1513) .0262 .0000) .0794 .0002 
8. CDE/c | .0029 .0000} .0329 .0004 .0001 .0262 .0383, .0000 .0285  .0004 
9. Cde/C — — | .0000) .0000 .0000 .0000 .0000 .0000 .0000 
11. CDe/C — — | — | — | ,0013) .0001 .0794 .0285, .0000 .0876 .0003 
12. CDE/C — | — | — | .0000 .0000 .0002 .0004 .0000 .0003) .0003 


paternity for a man of given type, allowing for the fact that to mothers of certain 
types children of some phenotypes will not be born. This information is presented 
compactly in table 3, by the device of representing the various phenotypes by num- 
bers. The numbers which stand for the various phenotypes are shown along the side 
of the table, and also at the top. 

Note that phenotype 8 is not listed at the top, as a man of such phenotype can 
not establish non-paternity by the Rh blood groups. Also note that the omission 
of Ry, from consideration means that certain (very rare) exclusions are omitted. 
For instance if Ry were present, phenotype 6 (CdE/c = Rh’Rh’’) would include 
genotype R,r (CdE/cde) and such a woman, mated (for example) to a man of type 
1 (cde — rh), could produce a child of phenotype 6, which, in the case of men of 
phenotype 9, 11 or 12 (Cde/C, CDe/C or CDE/C) would exclude paternity. Such 
possible (but very rare) children have not been included in table 3. On the other 
hand, since Ry is probably present, although very rare, in our population, no exclu- 
sions which could be based upon its absence have been included. For example, if Ry 
were not present, a woman of type 6 and a man of type 7 could not have a child of 
type 12, but this and other such exclusions have not been included in table 3. 

Table 3 was worked out with the aid of the CDE notation, which is much more 
convenient than Wiener’s for this purpose, but for the benefit of readers who may 
be more familiar with Wiener’s notation, the designations of the phenotypes are 
given in this system as well. 

In the case of the simpler blood group systems, it is possible to derive general 
formulas for the probabilities of exclusion by noting the phenotypes of children 
which would exclude a man of one phenotype when accused by a woman of each of 
the types, and adding all the expressions for their frequency. Thereby one obtains 
a formula which, since it gives the total probability, for each type of man, of finding 
mother-child combinations which exclude paternity, gives his chances of establishing 
non-paternity. Then by multiplying each of these expressions by the general for- 
mula for the frequency of the phenotype of the man, one obtains a general formula 
for the chances of a man, blood group unknown, establishing non-paternity. 
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TABLE 4.—PROBABILITIES OF EXCLUSION OF PATERNITY FOR MEN OF VARIOUS PHENOTYPES 


Phenotype Frequency in Population Probability of Exclusion 
1. cde 0.1436 0.4495 
2. cdE 0.0131 0.3628 
3. cDe 0.0241 0.4414 
4. cDE 0.1266 0.3355 
5. Cde/c 0.0062 0.1845 
6. CdE/c 0.0003 0.0973 
7. CDe/c 0.3577 0.1066 
8. CDE/c 0.1299 0 

9. Cde/C 0.0001 0.5162 
11. CDe/C 0.1973 0.4448 
12. CDE/C 0.0011 0.4175 
Unknown 1.0000 0.2500 


Although this could be done in the present case, it results in algebraic expressions 
which are complicated and extremely long, and which are much less convenient to 
use than the numerical tables which can be prepared, from the formulas of table 1, 
for any population. Consequently, the more convenient procedure is to add from 
table 2, for a man of given phenotype, using table 3 as a guide, all children’s fre- 
quencies which, for each type of mother, exclude paternity. This results in table 4. 

The general chances of excluding the paternity of a falsely accused man in an 
English (or American) population are thus found to be 0.2500. The value arrived at 
by Fisher on the basis of approximate gene frequencies which were rounded to two 
decimals, namely 0.2520, is thus seen to be amazingly close. The individual probabil- 
ities for men of the various Rh phenotypes agree less well. For instance, for a man 
of type Cde/C Fisher gives 0.5801, as opposed to 0.5162 found above. The present 
method could of course be applied to any population. 

Wiener (1950) studied 88 cases of exclusion of paternity, and found that the num- 
ber of cases excluded by the Rh groups alone and by Rh and ABO and/or MN 
agreed well with the assumption of 25% exclusions by the Rh groups. Allen, Jones 
and Diamond (1954) have calculated (without, however, deriving general formulas) 
that if all seven Rh sera (anti-C, anti-C¥, anti-c, anti-D, anti-E, anti-e and anti-f) 
could be used the chances of exclusion would rise to 35%. 


SUMMARY 


A general derivation of the frequency with which children of various Rh pheno- 
types are born to mothers of the various phenotypes is given, and from this the 
probability is calculated of excluding the paternity of a falsely accused man in a 
population having Rh gene frequencies like those of the English. The method is 
applicable to any population. 
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A Genetic Study of Multiple Polyposis of the 
Colon (With an Appendix Deriving a Method 
of Estimating Relative Fitness)’ 


T. EDWARD REED anp JAMES V. NEEL 
Heredity Clinic, Institute of Human Biology, University of Michigan, Ann Arbor, Michigan 


MULTIPLE POLyposis of the colon is a condition characterized by the occurrence of 
numerous polyps throughout the colon and/or rectum. The polyps usually make their 
appearance during the first and second decades of life, but occasionally may not 
arise until the fourth decade or possibly even later. Although the polyps themselves 
may be associated with symptoms referable to the large bowel, the disease is of 
clinical significance primarily because of the tendency of one or more of the polyps 
to become malignant, with death from carcinoma of the colon or rectum at a rela- 
tively early age. The disease has long been known to have a familial distribution 
(cf. Cripps, 1882), with Cockayne (1927) apparently the first to point out that this 
distribution was characteristic of a trait dependent on a dominant gene. Dukes (1952) 
has provided an excellent review and bibliography of the disease. 

Approximately 10 per cent of all adults can be demonstrated by combined sig- 
moidoscopic and X-ray studies to possess one or more polyps of the colon (cf. Helwig, 
1947; Swinton and Haug, 1947; Bacon, 1949; Gianturco and Miller, 1953). However, 
only a small fraction of these persons has true multiple polyposis. When appropriate 
diagnostic studies are carried out, there is seldom any problem involved in differ- 
entiating between the person who has hundreds or even thousands of polyps—and 
has multiple polyposis—and the person who has two or three, or even five polyps, 
but does not have the multiple polyposis with which this study is concerned. 

The two diseases with which multiple polyposis can most readily be confused are 
the so-called Peutz-Jeghers syndrome of diffuse intestinal polyposis and abnormal 
pigmentation (cf. Jeghers, McKusick, and Katz, 1949), and the syndrome of poly- 
posis of the colon associated with osteomatosis and fibromatosis (Gardner and Rich- 
ards, 1953). Both of these diseases are apparently much rarer than classical multiple 
polyposis, and were not encountered in the course of this study. 

The present investigation was undertaken in an effort to develop a more rounded 
picture of the genetics of this disease than is currently available. More specifically, in 
addition to accumulating further data concerning the inheritance of this condition, we 
have attempted to evaluate the biological handicap it imposes on affected persons, 
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and to estimate the frequency of the trait in the general population. Assuming genetic 
equilibrium, these data permit certain preliminary speculations concerning the rate 
with which the gene or genes responsible for this trait are appearing through mutation. 
Although, as will be pointed out in due course, large errors are inevitable in certain 
of these calculations, it is felt that to the extent that they contribute to arriving at 
an order of magnitude for a basic biological phenomenon, the calculations are worth- 
while and of general interest. 


GENETIC STUDIES ON MULTIPLE POLYPOSIS 


The 23 kindreds on which genetic studies have been carried out were located in 
the following ways: 

1. A survey of the records of the University Hospital of the University of Michigan 
for the period 1935-1950, which yielded 16 families for study. Subsequent experience 
has revealed that by no means all cases of polyposis seen at the University Hospital 
during this period were coded as such and that, further, some cases properly coded as 
polyposis were overlooked in the initial survey. However, so far as is known, no 
element of bias entered into the selection of these particular kindreds. 

2. Correspondence with a number of Michigan physicians specializing in gastro- 
enterology, which yielded three kindreds. 

3. Systematic follow-up of all death certificates filed with the state of Michigan 
during 1950-52 inclusively on which the cause of death is listed as carcinoma of the 
colon or rectum and the individual was below the age of 40. This procedure, under- 
taken in an effort to estimate the frequency of the trait (see below), yielded four 
kindreds. 


BASIC DATA 


The basic data on the 23 kindreds studied are presented in Tables 1 and 2. Each 
kindred contains at least one medically diagnosed case of multiple polyposis. One 
hundred and nine affected or possibly affected individuals are described. Seventy ot 
these are definitely known to have had multiple polyposis. Thirteen are known to 
have developed cancer of the colon or rectum and, because of their close biological 
relationship to an individual known to have polyposis, are assumed also to have had 
polyposis. The 26 remaining persons are included because of lay reports or incon- 
clusive medical reports of polyposis, bowel cancer, or significant bowel complaints, 
such as rectal bleeding. There are 65 males and 44 females, and 75 of the 109 indi- 
viduals were residents of the state of Michigan at the time of investigation or at 
death. 

The kindreds can be conveniently classified into two groups according to the. exist- 
ence or absence of good evidence for the presence of two or more affected individuals 
in a kindred. This division yields 14 kindreds of the familial type and 9 which are not 
clearly familial. Pedigrees and data concerned with the 14 familial kindreds are 
presented in Figure 1 and Table 1; data on the remaining 9 kindreds are given in 
Table 2. As Table 2 shows, several, or perhaps most, of these 9 kindreds may well 
contain two or more affected persons but data required for decision are lacking. 
Poor cooperation from close relatives of the propositus accounts for much of this 
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uncertainty. Particular aspects of Tables 1 and 2 are considered in subsequent 
sections. 

The familial kindreds are represented in the pedigrees and table as far as the data 
seem to warrant. It should be emphasized that the nature of the trait is such that 
pedigrees alone represent only a portion of the pertinent data; for more complete 
information the table should be consulted. Two of the 14 familial kindreds are of 
particular interest. Kindred 1826 has recently been described by Neel, Bolt, and 
Pollard (1954) and is noteworthy for its size, including 17 medically diagnosed 
cases of multiple polyposis. Another kindred, 1801, is remarkable for having a sibship 
of four persons (III, 9-12), all of whom have or have had diagnoses of multiple 
polyposis and/or cancer of the large bowel, three of them dying of cancer of the bowel 
under the age of 20 (at ages 9, 18, and 19). The earliest age at death from cancer of 
the large bowel among the other 22 kindreds is 26. The other unusual feature of this 
kindred is that in addition to diagnosed multiple polyposis in the mother of this 
sibship, the family physician reports that the father, II-2, was found at exploratory 
laparotomy to have cancer involving the stomach and transverse colon, appearing to 
be primary in the stomach, and the father’s father had cancer of the colon. Both men 
died at age 37. In addition, the father’s paternal grandfather is reported to have 
died at about the age of 35 of unknown causes. Since in none of the other 13 familial 
kindreds is there an affected sibship having one parent with multiple polyposis and 
the other with a history comparable to the above mentioned one, the question arises 
whether this parental history is related to the three early deaths in the sibship. It is 
conceivable that the father’s cancer was primary in the colon like that of his own 
father, both arising from multiple polyposis, in which case some or all of the three 
early deaths in sibship III, 9-12 may have been of persons homozygous for the poly- 
posis gene. This conjecture can neither be proved nor disproved at present. 


One kindred (2067) of the nine described in Table 2 also deserves special comment because it 
illustrates the diagnostic difficulties which occasionally arise. At age 6 the propositus came to medical 
attention because of a mass protruding from the rectum; this was found to be a prolapsed polyp. 
After sigmoidoscopic and X-ray studies, a surgeon made a diagnosis of multiple polyposis and per- 
formed a hemicolectomy with anastomosis of the mid-transverse colon and distal sigmoid. The pathol- 
ogist’s report on the specimen reads as follows: “Specimen consists of a 40 cm. segment of colon with 
attached mesentery and a separate pedunculated polypoid granular reddish-grey lesion about 1 cm. 
in gross diameter. On section, the wall is essentially normal in thickness. The mucosa of the specimen 
contains 5 reddish-grey granular lesions varying from 5 to 10 mm. in gross diameter. Three of the 
specimens have very long, soft pedicles. . . .” Microscopic sections of these lesions were typical of 
polyps of the colon. The boy was seen by us at age 10; sigmoidoscopic examination revealed no polyps, 
and two barium enemas, although demonstrating a shortened colon, likewise failed to provide evi- 
dence for polyposis. A brother, aged 7, his father, aged 38, and his mother, aged 31, were all negative 
to sigmoidoscopy and barium enema. The maternal grandmother underwent an exploratory laparot- 
omy at about age 60 and was found to have “generalized metastatic adenocarcinoma of the abdominal 
cavity” (hospital report); a maternal great aunt is reported by a physician to have undergone surgery 
because of carcinoma of the bowel at age 67. While there is no doubt that the propositus had multiple 
polyps of his distal colon, the complete absence of polyps in the remaining large bowel at the time of 
our examination raises doubt as to whether this is the type of multiple polyposis with which this study 
is otherwise concerned. It should be noted that the inclusion of this dubious case in the series does 
not affect any of the calculations to be presented below. 
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1 MINDRED 632 KINDRED 3498 


1 500 we Tr 


MINDRED 1928 


KINORED 3496 


KINDRED 609 


KINDRED 207 


MEDICAL DIAGNOSIS OF MULTIPLE POLYPOSIS, wit OR WITHOUT CANCER 
MEDICAL DIAGNOSIS OF PRIMARY CANCER OF THE LARGE BOWEL, POLYPOSIS NOT DIAGNOSED 
RELIABLE LAY REPORT OF CANCER OF THE LARGE BOWEL OR OF POLYPOSIS 


OVUBIOUS LAY REPORT OF INTESTINAL SYMPTOMS, CANCER OF THE BOWEL, OR POLYPOSIS; OR MEDICAL DIAGNOSIS OF DEATH FROM 
PERITONITIS OR INTESTINAL OBSTRUCTION OR METASTATIC CANCER OF THE LIVER 


MALE, NOT REPORTED TO HAVE INTESTINAL SYMPTOMS, CANCER OF THE BOWEL, OR POLYPOSIS 
FEMALE, NOT REPORTED TO HAVE INTESTINAL SYMPTOMS, CANCER OF THE BOWEL, OR POLYPOSIS 
UNKNOWN 


3 PERSONS, SEX UNSPECIFIEO, NOT REPORTED TO HAVE INTESTINAL SYMPTOMS, CANCER OF THE BOWEL, OR POLYPOSIS 


86288 


NORMAL AT TIME OF MEDICAL EXAMINATION FOR POLYPOSIS 
{ COnsancu NOUS MARRIAGE 
PROPOSITLS 


« 


Fic. 1. Pedigrees of kindreds which contain two or more individuals with multiple polyposis. 


INHERITANCE 


The published pedigrees of multiple polyposis clearly indicate that this trait is 
usually, if not always, determined by a dominant gene of fairly high penetrance. The 
distribution of affected persons within the kindreds of the present study is in keeping 
with the reports in the literature, medically diagnosed polyposis occurring in two 
generations of 10 kindreds and in three generations of one. 

Unfortunately, a precise calculation of the proportion of affected and unaffected 
sibs within sibships having a parent with polyposis, desirable as a check on the 
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] KINDRED 1624 


KINDRED 418662 KINDRED 3361 
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] KINDRED 1864 
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Fic. 1 


hypothesis of dominant inheritance and on the degree of penetrance under this 
hypothesis, is not possible. The late onset of symptoms and diagnosis in some indi- 
viduals (about 10 per cent of the medically diagnosed cases are first diagnosed over 
age 45) and the lack of diagnostic information on a number of individuals account for 
the fact that not one sibship in the 14 familial kindreds gives critical information on 
the segregation of the gene for polyposis. However, it is possible to obtain a rough 
test for agreement with the expected 1:1 ratio in segregating sibsips by using only 
sibships in which persons reported to be normal are well within the age of manifesta- 
tion of the trait, and, at the same time, employing conservative criteria in classifying 
individuals with respect to the trait. In this calculation, two kinds of sibships—that of 
the propositus and, where possible, that of his affected parent—have been utilized; 
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the propositus and affected parent have been excluded from the calculation. Such a 
test, in which the minimum age of “‘normal]” sibs was set at 41 (except for one indi- 
vidual, Kindred 1826, III-24 described below), and in which sibs reported as affected 
by “reliable” lay sources were counted affected, and sibs dubiously reported affected 
were counted normal, was applied to 9 sibships (Kindred 809, II, 1-5; Kindred 832, 
II, 1-11; Kindred 1554, 1, 1-7; Kindred 1554, II, 6-9; Kindred 1801, III, 9-12; 
Kindred 1826, II, 1-8; Kindred 1826, III, 1-9; Kindred 1826, III, 20-30, counting 
III, 24 who died at 32 of “epilepsy” as normal; Kindred 2070, II, 1-2). It showed 24 
affected persons and 36 unaffected, a proportion not differing significantly from 1:1. 
If there is 1:1 gene segregation and reduced penetrance, the observed proportion 
deviates from 1:1 in the direction expected. It seems a reasonable assumption that 
multiple polyposis is determined by a single gene whose penetrance is of the order of 
90 per cent in persons of age 50, higher in still older persons. 


FREQUENCY OF MULTIPLE POLYPOSIS 


An attempt was made to obtain an estimate of the frequency at birth of individuals 
heterozygous for the gene for multiple polyposis. This estimate cannot be directly 
determined, but an approximate estimate can be based on the proportion of indi- 
viduals dying within some time-interval who have had multiple polyposis. This 
approximation is biased toward underestimating the frequency at birth because some 
persons with the gene for polyposis will not be recognized at their death, either dying 
of cancer secondary to polyposis without diagnosis of polyposis, or from some other 
cause. A further bias, but in the opposite direction, exists when applying this method 
to American populations. This second bias is a consequence of the increasing absolute 
number of births per year in the U.S.A. and the decreased life expectancy of indi- 
viduals with the gene. Its effect is to over-estimate the true frequency at birth. It is 
not possible to make any adequate allowance for these biases, but the former would 
seem to be more important. This estimate is probably an underestimate. 

The frequency estimate was obtained in the following manner. The Department of 
Health of the state of Michigan furnished copies of the death certificates of all 
persons who died in Michigan before age 40 during the three-year period 1950-52 
from primary carcinoma of the colon or rectum. One hundred and two such certifi- 
cates were on file. On 25 certificates it was stated that an autopsy had been performed. 
The findings on these certificates were accepted as final. An effort was made to con- 
tact the next-of-kin of each of the remaining 77 deceased persons to obtain permission 
for the release of medical information to the Heredity Clinic. After receiving such 
permission, letters were written to the deceased’s physician and to the hospitals where 
the deceased was studied, requesting copies of his medical records in order to deter- 
mine whether he had multiple polyposis. In 59 cases medical reports were obtained, 
and in 18 they were not obtained. This procedure yielded 4 persons with multiple 
polyposis (Kindreds 3496, 3498, 4029, and 4103); only one (Kindred 4103) of these 
4 persons had a death certificate which failed to state that multiple polyposis was 
present. The proportion of known cases of multiple polyposis is therefore 4/102 or 
0.039 + 0.019. 

It soon became apparent that a frequency estimate based on this proportion would 
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be a minimal one. Thus, some death certificates fail to mention multiple polyposis 
even when the persons’ hospital records do. Furthermore, in 3 of the 23 kindreds 
included in this study, the hospital records of persons who were parents of two or more 
children who themselves had polyposis, stated only that cancer of the large bowel 
was present. It is almost certain that these three persons (see Pedigrees 832, 1564, and 
1982) had the gene and trait of multiple polyposis. In order to obtain a more reliable 
estimate of the proportion of persons dying before age 40 from primary cancer of the 
large bowel secondary to multiple polyposis, a survey was made of all the records of 
persons dying before age 40 from primary cancer of the rectum and colon who were 
studied at the University Hospital, Ann Arbor, in the period 1935-1944. Of 58 such 
persons, 6 had definite multiple polyposis and 1 had questionable multiple polyposis. 
Counting only the 6 definite cases, the proportion is 6/58 or 0.103 + 0.040. It is clear 
that this may be an underestimate of the true proportion.” 

In a population in equilibrium, the frequency (f) at birth of individuals with the 
gene for polyposis is equal to the frequency, among all deaths in a specified time 
interval, of individuals dying with the gene. If, for a specified population, 

T = the specified time interval 
P = the number of individuals with the gene for polyposis dying in T 
D = the total number of deaths of all individuals in T 
a = the observed proportion of individuals, among those dying before age 40 
from primary cancer of the large bowel, whose cancer is secondary to 
multiple polyposis 
b = the observed number of individuals dying before age 40 from primary 
cancer of the large bowel in T 
c = the observed proportion of individuals, among those dying from cancer of 
the large bowel secondary to multiple polyposis, who die before age 40 
then an approximation to P is given by ab/c. This estimate of P will be too low since 
some persons with the gene for polyposis fail to die from, or are not recognized as 
having died from, cancer secondary to multiple polyposis. For 7 we have used the 
three-year interval 1950-52. From the preceding section a is 0.103 + 0.040; b for 
the state of Michigan is 102; c from Table 6 is 45/91 or 0.495 + 0.052; D for the 
state of Michigan is 175,842. The frequency at birth is then approximated as 


_P _ ab _ _(.103 + .040)(102) 


D cD (495 + .052)(175,342) 


~ (1.21 + 0.49) X 10~ or 1 in 8,300 individuals. The expression ab/c neglects 
individuals with the gene for polyposis who die of causes other than cancer secondary 


? An observation which may be pertinent here is that the frequency distribution of age at death 
from cancer of the large and small intestine (almost entirely due to the large intestine) shows a 
noticeable “bump” at the 30-34 year interval relative to the corresponding distribution for cancer of 
the stomach (data from Michigan Department of Health over the interval 1933-1945 inclusive.) 
Multiple polyposis will contribute to the former distribution but not to the latter so that it seems 
possible that the early age at death subsequent to polyposis may be responsible for this bump. If 
such is the case and if, in this age range, and in the absence of polyposis, the two frequency curves 
are proportional, one can estimate that cancer following polyposis may account for as much as a 
quarter of all cancer of the colon under age 40. 
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to polyposis, and a may well be an underestimate of the true proportion of individuals, 
among those dying before age 40 from primary cancer of the large bowel, whose cancer 
is secondary to polyposis. Therefore the true value of f is probably higher than the 
calculated value. It should be noted that because persons with the gene for polyposis 
have a decreased life expectancy, the frequency of such persons in the general popu- 
lation will be less than (approximately two-thirds) that at birth. 


RELATIVE FITNESS OF INDIVIDUALS WITH MULTIPLE POLYPOSIS 


Fitness, in a population sense, is measured by ability to produce children and is a 
function of viability (from birth through the reproductive period) and fertility (in the 
narrow sense, once the reproductive period is reached). The fitness of a class of 
individua]s relative to that of another class may, under ideal conditions, be measured 
by the ratio of the expectation at birth of the number of live-born children to be 
produced by a live-born individual of the first class, to the corresponding expectation 
of the second class. If the first class is composed of heterozygotes for a rare dominant 
gene which lowers fitness, such as that for multiple polyposis, and the second class is 
the remainder, all normal in this respect, of a population in equilibrium, this estimate 
of relative fitness (W) is also the proportion of the dominant genes transmitted from 
one generation to another. An estimate of W for multiple polyposis will measure the 
degree of natural selection for or against bearers of the gene and is required in the 
indirect estimation of the mutation rate. Two independent methods of calculating 
W are available. 

The direct calculation of relative fitness from the observed reproductive perform- 
ance of affected persons and their normal sibs is complicated by the late manifesta- 
tions of multiple polyposis in some individuals. Depending on the method of ascer- 
tainment of the data, a possible further complication is the tendency of members of 
large families to have more children than members of small families (Fisher, 1930). 
The first difficulty is somewhat reduced by using only sibships whose apparently 
normal members were all over 40 years of age at the time of investigation or at death. 
Six such sibships are available; they are described in Table 3. It is clear that appear- 
ance of polyposis in persons after 40 can make this estimate of W an underestimate 
since, on the average, these persons are very probably more fertile than persons 
affected before 41. In theory, this restriction introduces a possible bias from deaths 
before age 41 of individuals lacking the gene for polyposis. In our sample, however, 
there were no such deaths among the apparently normal members of these sibships, 
thus eliminating this source of bias. Unfortunately, the 6 sibships are heterogeneous 
with regard to ascertainment, 2 containing a propositus, 2 containing one parent of a 
propositus, 1 containing two parents of propositi, and 1 containing first cousins of the 
propositus. Lacking a suitable weighting procedure to correct for ascertainment and 
sibship size frequency, it seems best to calculate W simply as the ratio of the mean 
number of children from persons affected with polyposis to the corresponding mean 
for the apparently normal sibs, using pooled data of the six sibships. From the totals 
of Table 3, the mean for affected persons is 45/18 = 2.50 and that for “normal” 
persons is 78/23 = 3.39, giving an estimate of W of 0.74. If, in fact, there is no 
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TABLE 3.—DESCRIPTION OF SIBSHIPS USED IN DIRECT CALCULATION OF RELATIVE FITNESS 
All “normal” individuals were over age 40 at time of investigation or at death. See “Remarks” 
for details on omission of certain individuals. 


| Affected “‘Normal”’ 
Sibship Size Us. No. of (es | No. of Remarks 
Peet children number Children 

1554 (II, 6-9) 4 3 8 1 2 All persons usable; first cousins of the 
propositus. 

809 (II, 1-5) 5 2 3 a 4 | II-1, parent of propositus, excluded. 

1554 (I, 1-7) 7 2 6 + 10 __—_‘I-7, parent of propositus, excluded. 

1826 (II, 1-8) 8 2 g* 4 12 II-1 and II-5, parents of propositi, ex- 
cluded. 

1826 (IIT, 20-30) 11 5f 16 + 26 ~~ ITT-24 and ITI-26 excluded because of 
uncertain status with regard to poly- 
posis. A propositus is included. 

1862 (II, 1-12) 12 4t 3 8 24 | All persons usable; the propositus is in- 

cluded. 
Total oe 18 45 23 78 


* Twins counted as one individual. 
t+ III-27 counted as affected. 
t II-4 counted as affected. 


difference in the means, the probability of obtaining an estimate as low as or lower 
(single-tail test) than the one observed is 0.05. 

The second method for estimating relative fitness is indirect and requires the 
assumptions that (1) the only effect on W of the gene for multiple polyposis is through 
the death, from cancer secondary to polyposis, before the end of the reproductive 
period, of some affected individuals, and (2) that until their death persons with the 
gene reproduce at the same rate as persons lacking the gene. There appear to be no 
data suggesting that the first assumption is incorrect with regard to the biological 
action of the gene, although the possibility of pleiotropic effects must always be 
considered. The second assumption, judging from the present data, appears to be 
reasonable for the period up to the time of first bowel complaint. The possibility 
exists that some children of affected parents will restrict their reproduction when 
they are concerned about the appearance of polyposis in their own descendants. 
Such restriction was not apparent in our data. Reproductive capacity is, of course, 
impaired in the interval between onset of symptoms and death. The bias resulting 
from making the second assumption, however, appears to be only several percent. 
This bias is discussed below. 

This estimate of W, then, will actually be a function of relative survival to and 
through the reproductive period. The quantity, which we here equate with W, we 
may term the “relative reproductive span” or RRS.* 


* Although the use of RRS as a measure of fitness appears permissible in the present context, it 
should be recognized that for many dominant traits this is not the case. Thus, Crowe, Schull, and 
Neel (in press) find that although part of the effect of the gene responsible for neurofibromatosis on 
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pz = the proportion of births, among all births, which occur to parents of age x 
(mean of values of paternal and maternal age distributions), 

l, = the proportion of all live-born individuals with the gene for polyposis who 
survive to age x, 

L, = the proportion of all live-born individuals lacking the gene for polyposis who 
survive to age x, and 

d; = the proportion of deaths, among all deaths from cancer secondary to poly- 
posis, which occur at age i, 


us 


summation extending over the longest life span. These relations are derived in the 
appendix. 

To obtain estimates of d, both the data of the present study and the excellent 
study of Dukes (1952) were used. Utilizing only individuals dying at a known age 
from medically diagnosed cancer of the large bowel and who either were medically 
diagnosed as having multiple polyposis or who were close relatives of persons so 
diagnosed, 29 ages at death were obtained from the present study and 62 from that 
of Dukes. The period over which these deaths occurred was 1916-1953, with a mean 
at 1940.4 for the present study, and 1882-1951, with a mean at 1930.2 for the 59 
deaths of Dukes’ study for which data are given. There was one other death before 
1900 among these 59. The distribution of these ages is given in Table 4 and the means, 
standard deviations, and standard errors in Table 5. It is seen that the means and 
variances of males and females do not differ significantly within the two studies nor 
do the means and variances between the studies. It was therefore considered appro- 
priate to combine all data, resulting in a mean age of death of 40.21 + 1.23 years and 
a standard deviation of 11.77 years. In spite of a marked dip at the 35-39 year interval 
the combined distribution does not differ significantly from that expected of a normal 
curve with the same mean and variance. To reduce the effects of chance fluctuation 
in the proportions of deaths in the 5-year-age intervals, it seemed advisable to 
estimate d; from the normal curve.‘ Values of d; through the reproductive period are 


fitness is exerted through the early death of a few individuals with the trait, the major effect is a 
decreased marriage rate on the part of persons with the trait, as well as impaired fertility after mar- 
riage. At the other extreme, Panse (1942) and Reed and Palm (1951) have suggested that despite the 
early death of some persons with Huntington’s chorea, the net fertility of affected persons is actually 
greater than normal. 

‘It is recognized that it is unlikely that the distribution is normal. More extensive data would 
doubtless make this apparent and also would permit a decision as to whether or not the two observed 
peaks in Table 5, at 30-34 years and 45-49 years, are real. At the present stage of our knowledge 
the assumption of normality, for purposes of calculation, seems justified by the symmetry of the mean 
with respect to the extremes and by the similarity of the distributions around the modes of age-at- 
death curves for various diseases. The mean age at death in polyposis of 40 years, with extremes at 
about 10 and 70 years, suggests a symmetrical distribution, while the bell-shaped distributions around 
the mode found in many diseases, including various cancers, suggest that the normal distribution is 
not unreasonable. 
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TABLE 4.—DIsTRIBUTION OF AGES AT DEATH OF PERSONS DYING FROM CANCER OF THE COLON OR 
RECTUM ARISING FROM MULTIPLE POLYPOSIS (CANCER MEDICALLY DIAGNOSED) 
Present study and Dukes, 1952 


Age at death Present study* Dukes, 1952 Total 
0-4 0 0 0 
5-9 1 0 1 

10-14 0 0 0 
15-19 2 0 2 
20-24 0 3 S 
25-29 3 6 9 
30-34 7 14 21 
35-39 1 8 9 
40-44 + 8 12 
45-49 6 9 15 
50-54 3 5 8 
55-59 0 6 6 
60-64 1 2 3 
65-69 0 1 1 
70-74 0 1 
75+ 0 0 0 


* Omitting the 4 persons selected for having died under 40 years of age. Multiple polyposis 
either medically diagnosed or inferred from existence of a parent or child with medically diagnosed 
multiple polyposis. 

+ All individuals are members of kindreds containing at least one medically diagnosed case of 
multiple polyposis. Diagnosis of cancer established by one of the following: medical or death cer- 
tificate, medical examination, hospital record. 


given in Table 6. Values of p, for Michigan are available from the vital statistics of the 
U. S. Bureau of the Census for 1934 and later years. In order to make the data for 
p, and d; more comparable, p, for Michigan in 1935 was used. The values of p, and 
the calculation of the relative reproductive span are given in Table 6. This estimate 
is 0.78, slightly higher than the previous estimate of W of 0.74 obtained from the 
observed reproductive performance of 6 sibships. This latter estimate is considered 
less reliable than the RRS estimate because (1) it is based on few data, (2) the diag- 
noses of a number of individuals are not certain, and (3) the method to correct for 
ascertainment and sibship size is not apparent. 

Two known biases of this estimate should be considered. One bias is in the formula 
for the RRS. The derivation of the formula used shows that it slightly underestimates 
the true value of /,/Z,, and hence of RRS. A bias in the opposite direction results from 
the assumption that reproduction continues until death from cancer secondary to 
polyposis. Among the 29 cases of the present study used in estimating the mean age 
at death, data were available in 17 cases on the interval between onset of significant 
bowel complaints and death from cancer. This interval varied from less than one 
year to 14 years, with a mean of 3.2 years. Of these 17 cases, 10 died under age 40 
and the intervals varied from less than one year to 8 years, with a mean at 2.8 years. 
Since, on the average, reproductive capacity does not end with onset of bowel com- 
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TABLE 5.—AGE AT DEATH FROM CANCER OF THE COLON OR RECTUM OF PERSONS WHOSE CANCER 
AROSE FROM MULTIPLE POLYPOSIS (CANCER MEDICALLY DIAGNOSED) 
Present study and Dukes, 1952 


Number | All deaths 
Source* Individuals | | 
| All Mean Standard 
| years | deaths | deviation | Conor 
Present study | Males 6 15 | 42.00 13.76 | 3.55 


| Females 8 14 35.57 | 11.54 | 3.08 
Males and females | 14 29 38.90 12.93 | 2.40 


Males 18 34 | 40.24 10.59 | 1.82 
Females | 13 28 41.54 | 2.20 
Males and females 31 62 40.82 14.25 1.43 


Dukes, 1952 


Present study and Dukes, 1952 | Males and females 45 91 | 40.21 .77 | 8.28 


* See Table 4 for further description. 


TABLE 6.—CALCULATION OF THE RELATIVE REPRODUCTIVE SPAN (RRS). SEE TEXT FOR SYMBOLS 


0-14 .000 .016 .992 
15-19 .063 .027 .970 
20-24 .250 -0555 .929 
25-29 .275 .0935 .855 
30-34 .200 -138 .739 
35-39 .125 . 162 . 589 
40-44 .060 . 167 -425 
45-49 .019 .138 
50-54 .006 .099 .153 
55+ .002 .104 .052 


* = dj as used here represents the proportion of deaths up to the mid-value of each age 
0 


interval. 
100 x-1 
RRS = > px (: - 2 a) = 0.78 
0 


plaints but will steadily decline from that time, the above assumption involves an 
error of a year or so. A method based on this assumption will overestimate the true 
value by a few percent. It is not possible to say whether these two biases will cancel 
out. The relative fitness at the present may be higher than the calculated value since 
this value depends on deaths with a mean around 1935. Better diagnosis and greater 
use of radical surgical procedures in the future treatment of multiple polyposis may 
be expected to increase the life expectancy of affected individuals and so raise the 
relative fitness. 
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CALCULATION OF THE MUTATION RATE ESTIMATE 


Because of the late appearance of polyps, and symptoms subsequent to polyps, in 
some individuals (e.g., Dukes (1952) studied an individual who at 38 was normal to 
sigmoidoscopy but at 44 had polyps and at 56 developed carcinoma of the colon; in 
the present study III-5 in Kindred 1826 first had intestinal complaints at age 57; 
polyposis was diagnosed at 58), it is not possible, at present, to be certain that any 
individual will not later develop polyposis, although it seems quite unlikely that 
polyps will first appear after age 50. Therefore, it is not feasible to calculate a direct 
estimate of the mutation rate. Only one kindred (1963) of the present study presents 
a reasonable case for mutation. The propositus of this kindred has 9 sibs whose ages 
range from 25 to 40 and both parents are living at age 69; all are reported free of 
significant intestinal complaint. Several other kindreds may demonstrate mutation 
but again no proof can be offered. At present we are forced to rely on an indirect 
calculation of the mutation rate. 

If the mutation rate/gene/generation is m and the population is in equilibrium 
with respect to production and loss of genes for polyposis, the following customary 
equation applies: 


_ fa W) 


m 
2 


The estimate of W from the relative reproductive span is more reliable and will be 
used here. Substitution of the values for f and W gives 


= X 10“(1 — .78) 


= 13x 10° 
x 


Since f is probably an underestimate, the value of m may, perhaps, actually be up to 
twice this value. The probable bias of W is not known. Considering the bias of f, it 
seems unlikely that the true value of W would be such that m would be less than 


about three-fourths of the calculated value. m probably lies within the range 1-3 X 


DISCUSSION 


The present study is one of a series of investigations on mutation rates carried out 
by this Clinic. Some of the problems inherent in such investigations, as they have 
impressed themselves on us, have been discussed elsewhere (Neel, 1952; Neel and 
Schull, 1954; see also Haldane, 1949, Nachtsheim, 1954; and Vogel, 1954). In addi- 
tion to certain methodological questions which are common to all mutation rate 
studies, each trait selected for study has raised particular difficulties more or less 
unique to that trait. Thus, in the case of multiple polyposis, we are confronted with 
the fact that it is very difficult to demonstrate that any particular “sporadic” case is 
due to mutation. As a consequence, no use can be made of the “direct” method of 
estimating mutation rate (i.e., from the observed frequency of sporadic cases). Both 
the necessity of performing sigmoidoscopic and X-ray studies on the parents of 
affected persons and the late appearance of polyps and symptoms in some individuals 
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preclude even an approximate direct determination of mutation rate. In passing, we 
may note that although we recognized the unpleasant nature of these diagnostic 
studies, we were unprepared for the lack of cooperation sometimes encountered, 
especially since the studies were so obviously to the advantage of the person being 
investigated. 

Since the direct method was not feasible, we have employed the indirect approach, 
based on estimates of frequency and relative fitness. The estimation of the relative 
fitness of affected individuals proved to be difficult, partly because it approaches that 
of the general population. The observed relative fitness of about 0.8 is appreciably 
higher than that of most other dominant traits for which mutation rate estimates 
exist. The most critical assumption underlying the use of the indirect method, in this 
and other studies, is that genetic equilibrium obtains, i.e., that the loss of genes 
(for the trait in question) in each generation is balanced by the appearance of new 
genes through mutation. While there are on record a number of kindreds highly sug- 
gestive of the occurrence of mutation with respect to the gene for multiple polyposis 
(e.g., our Kindred 1963; families 12 and 21, Dukes, 1952; 1 kindred, Gardner and 
Woolf, 1952), the assumption that the polyposis locus is in genetic equilibrium be- 
cause of mutation from the normal allele to the gene for multiple polyposis is less 
tenable here than in the case of some other dominant traits which have been utilized 
in mutation rate studies. On the other hand, the proportion of propositi whose disease 
is not clearly inherited in both this study and that of Dukes (1952), while obviously 
an overestimate of the true proportion of sporadic cases, is in keeping with the 
hypothesis that each generation about one-quarter of the polyposis genes must 
arise through mutation if genetic equilibrium exists. 

A few general remarks concerning the philosophy of this Clinic with regard to 
mutation rate studies are perhaps appropriate at this point. Almost every mutation 
rate estimate advanced to date—not excluding our own—can be subjected to severe 
criticism. It seems not only possible but probable that many of the existing estimates 
err by a factor of two or even more. At this stage in our developing appreciation of 
the problem, this does not seem to us a serious deterrent to such studies. The present 
challenge is to fix the order of magnitude of the phenomenon in a relatively long-lived 
animal, by a series of studies on as many traits as possible. Later, as techniques 
improve and the outlines of the problem become clearer, greater accuracy will be 
possible, as will a comparison of human mutation rates with those of other forms. 
The present estimate of 1-3 X 10-5 mutations/gene/generation falls well within the 
range of other available estimates, all of which, of course, assume that only a single 
locus is involved. The dangers inherent in attempting to generalize at the present 
time from this and the other existing estimates to ali genes have been discussed 
elsewhere (Neel and Schull, 1954). On the other hand, each new estimate strengthens 
the foundation of fact from which generalizations may someday be possible. 


CONCLUSIONS AND SUMMARY 


In a study of the genetics of multiple polyposis of the colon, a rare dominant trait, 
special emphasis has been given to the estimation of the frequency and relative 
fitness of individuals bearing the gene, and of the mutation rate of the gene. The 
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material of this study consists of 23 kindreds, including 70 certainly and 13 probably 
affected persons. Fourteen of the 23 kindreds contain two or more affected members. 
The mean age at death from cancer of the colon or rectum subsequent to polyposis, 
in 91 very probably or certainly affected individuals in the present study and in that 
of Dukes (1952), was 40.21 + 1.23 years. From a survey of the University of Michigan 
Hospital records between 1935 and 1944, the proportion of individuals, among persons 
dying before age 40 from cancer of the colon or rectum who also had multiple poly- 
posis, was estimated. This estimate, which is minimal, is 0.103 + 0.040. From these 
facts and the known distribution of age at death from cancer of the colon and rectum 
in Michigan, an estimate of the minimum frequency at birth of individuals with the 
gene for multiple polyposis was obtained: (1.21 + 0.49) X 10~ or 1 in 8,300. 

Two estimates of the relative fitness of individuals with the gene have been derived, 
the more reliable being that of the “relative reproductive span.” This is a weighted 
measure of the survival, to and through the reproductive period, of persons with the 
gene relative to that of persons lacking the gene. This estimate is 0.78. Using these 
estimates for frequency and relative fitness, and considering the known biases 
involved, the mutation rate is estimated to be 1-3 X 10~5/gene/generation. 

An appendix derives equations for the estimation of relative reproductive span. 
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APPENDIX 
DERIVATION OF EQUATIONS FOR ESTIMATING THE RELATIVE REPRODUCTIVE SPAN (RRS) 


The direct estimation of the relative fitness of individuais with a disease like 
multiple polyposis, from their observed reproductive performance, may be un- 
satisfactory because of the late onset of the disease in some individuals. In such a 
situation an indirect estimate may be necessary. If reliable data are available on 
the age at death due to the disease and the assumption is justified that early death 
from the disease is the only factor lowering the relative fitness of such individuals, 
it is possible to obtain an indirect estimate which may be more reliable than the 
direct. 

Consider two cohorts, C; and C,, each containing NV new-born individuals (.V 
being large). The cohorts differ only because all members of C; have the gene for 
polyposis and all members of C; lack it. If the cohorts are enumerated each year 
from birth (x = 0) to death, and births to cohort members at each year x are noted, 
we may define the following terms: 


number of births to a cohort in the year x 


f. = age-specific fertility = — —, 
number of individuals of the cohort alive 


at the beginning of year x 


l, = proportion of C, surviving to age x, 

L, = proportion of C, surviving to age x. 
fz is postulated to be equal for the two cohorts, i.e., as long as an individual with 
the gene is living he is assumed to be as fertile as individuals lacking the gene. (The 
small bias introduced by this assumption is discussed in the text.) 

Our observed data on births and deaths are not in terms of cohorts but are ob- 
tained from a population composed of all ages. It is therefore pertinent to note that 
for a normal population in equilibrium the age-distribution will be that of a “life 
table population” (Dublin, Lotka, and Spiegelman, 1949), i.e., L, is the proportion 
of persons who are age . We shall make use of this equivalence below. In terms of 
the observed population data we may define two further terms: 

p: = the proportion of births, among all births, which occur to parents of age « 

(mean of values of paternal and maternal age distributions), 
d; = the proportion of deaths, among all deaths, from cancer secondary to poly- 
posis which occur at age i. 
The mean number of children (live-born) ever born to the N members of C; is 
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p ae 5 the summation extending to the longest life span. The corresponding 
z 


mean for the V members of is . Therefore, 


of 


W = relative fitness = 


But, for any age j 
Np; 


bi 
f= 


NE; ’ 


w= 


under the above assumptions, summation again extending to the longest life span. 
If the assumption stated above, ie., that early death is the only effect of the 


gene on fitness, does not hold, > p, (+) will not equal WW’. We may define 


so that 


D> bz (#) as the relalive reproductive span (RRS). Its usefulness for the present 


study as a means of estimating W,, requires the above assumption, but, for other 
traits where a direct estimate of W is available, it may be helpful in testing the 
validity of this assumption. 
Let 
yz = probability that a person with the gene for polyposis who reaches his xth 
birthday will die from cancer secondary to polyposis before his « + 1th 
birthday, 
r, = probability that a person lacking the gene for polyposis who reaches his 
xth birthday will die before his « + 1th birthday. 
Since persons with the gene for polyposis, before onset of symptoms, are assumed 
not to differ from persons without the gene, and, if after onset of symptoms they 
are considered to be still subject to all the other causes of death from which persons 
without the gene die, the probability that such a person will die within the year 
following his xth birthday is g, + r. — qzrz (very nearly; shorter time intervals 
would make this more exact). Since g,7, for ages of interest to us, i.e., up to the end 
of the reproductive period, is small compared to g, and r,, we may neglect this 
term and write 


z—1 
2, + ri) 
and 


z—1 
0 
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Neglecting products of the form gir; , etc. we may write 
I, = h + logo , 
= ly + hq + logo 


and, in general, 


z—1 
= l, + > 
0 


Therefore 
i—l 
Lir; = l; + Zz 
0 
and 
z—1 
] 1 Li(qi + r;) 
L. = -—- 
0 0 0 
Let 
z—1 i—1 
0 0 
Then 
z—1 
] l; K, 
z 0 
a z—1 
1— > Ler; K, 
0 


Since, nearly, 


100 100 


z—1 
0 0 z 


and putting 


100 


= P 


0 


where P is the proportion of persons born with the gene for polyposis who die from 
cancer secondary to polyposis, and noting that, by definition, 


z—1 z—1 
1 


0 
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then 
z—1 
= d; — 
l, =} 0 
> l; 
1 z K, 
Since only }°s~'d; is known, it is necessary to consider the appropriateness of 
estimating /,/L, by 1 — >-5"' d;. The magnitude of P is not known with certainty 


but study of kindreds in which the gene for polyposis is segregating yields informa- 
tion since we find (in this study and in Dukes, 1952) that the observed proportion 
of cases of polyposis among adult offspring of an affected parent approaches the 
expected 0.5 and that large bowel cancer usually follows before age 50 is reached, 
although there are several instances of later onset. It seems unlikely that during 
the present century any large proportion of individuals born with the gene fails to 
manifest multiple polyps and subsequent cancer. Such cancer, until recently, must 
usually have proved fatal. The 1940 American life table indicates that about 83 
per cent of live-born individuals will be alive at age 50 and, therefore, persons with 
the gene are likely to survive to the age where cancer from malignant degeneration 
of polyps is prevalent. With these considerations, the value of P would be expected 
to be at least of the order of 70 per cent. For low values of x, e.g., in the case of 
multiple polyposis under 25 years of age, the expression for /,/L, is seen to reduce 


d; z—1 
tol — " , 7 80 that if > d; = 0.1 and P = 0.7, l./L, = 0.93, while the 
0 
i+ P 
approximation 1 — }°5~'d; gives 0.90. A higher value for P, of course, makes 


this approximation better. The approximation should be of this order in the middle- 
aged range and, for high x, should be better yet. 


If we introduce this approximation to /,/L, into our previous formula for RRS, 
we have 


z—1 
RRs = >> p.(1 -> i.) 
zx 0 


the relation to be derived. This estimate should slightly underestimate the true 
value of RRS. 


Aspects of Genetics in Psychology 


ARNE TRANKELL 
Department of Psychology, University of Géteborg 


INTRODUCTION 


THE stupy OF heredity in psychology has mainly concentrated on twins with the 
specific objectives conditioned by this kind of material. Only sporadical attempts 
have been made to study heredity according to Mendelian principles. The reason for 
this is probably to be found in the fact that most behavior patterns are not stable 
enough to permit a Mendelian hypothesis. The question may be raised, however, if 
psycho-genetics does not have at its command an instrument which could be used 
far more often than has been the case. This instrument is population genetics. 

In his book, ‘‘Mathematical Methods for Population Genetics”, Dahlberg (1947), 
has given formulae which have applicability on a number of problems in human 
genetics. The working conditions encountered by psychology, especially as regards 
environmental influences and methods of measurement, necessitates, however, the 
construction of special formulae to meet its demand. Formulae not based on 
Mendelian principles are, of course, out of the question. The suggestions presented 
in this paper are therefore nothing but applications of the Mendelian principles on 
problems of particular interest to the psychologist. 

In order to be able to draw any conclusions regarding the course of inheritance of 
a specific trait it is necessary to know the proportions of the phenotypes that may 
occur. This may be possible when panmixia prevails which is probably the case for 
a number of mental traits with a hereditary background. The simplest case is found 
when the inheritance is monofactorial and there are no more than two alleles. This 
paper mainly deals with this case. 

When it comes to mental traits, possible deviations from the classical Mendelian 
principles of manifestation, must be taken into account. The reason is that psychology 
has to work with behavior and not with morphological traits, and the behavior is 
less constant and more easily influenced by environmental factors, than other hered- 
itarily conditioned expressions. It is probably more of a rule than an exception that 
behavior tendencies are so influenced by the environment that the original effect of 
the genotype no longer can be observed. If we conceive, for instance, that a certain 
behavior pattern is promoted by a dominant gene and that the proportion between 
the alleles is such as to let the majority be characterized by the dominant behavior, 
a social pressure may arise in the population promoting the dominant behavior at 
the expense of the recessive behavior. The recessive homozygotes are thereby forced 
to adopt the more sanctioned dominant behavior. An example of this is found in the 
strive of the lefthanded to accept and adopt the right-handed behavior which is ex- 
rpessed in a decreasing frequency of left-handedness with increasing age. 
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Another difficulty to be mastered by psycho-genetics lies in the technique of 
measurement. The recording of behavior tendencies is more difficult than any other. 
This is due in main to the fact that it is nearly impossible to obtain generally accepted 
criteria, but secondly also because of the inconsistency of behavior in itself. We have 
therefore to take into account that because of measurement-technical difficulties, 
every recording of trait-carriers will end up more or less incomplete. This indicates, 
that, in what is technically referred to as the “incomplete penetrance” of the gene, 
we have also to consider a certain decrease in manifestation because of incomplete 
recording of the trait-carriers. 


A THEORETICAL MODEL 


Many traits with a hereditary background are influenced by a directed reduction 
in manifestation, which means that the environment and the incomplete recording 
are both active in a particular direction, as in handedness from the left-handed 
toward the right-handed behavior. When this is the case, we may mathematically 
eliminate the effect of the incomplete manifestation by including in the calculation 
the deviations in the proportions of genotypes within different groups caused by the 
incomplete penetrance. In some cases it will not only be possible to test the genetic 
hypothesis, but also to calculate the proportion of the alleles in the studied population. 

One way to perform this task will now be described for the case when the theo- 
retical model assumes complete penetrance of the dominant trait in the heterozygotes 
and the dominant homozygotes, but only partial penetrance of the recessive trait in 
the recessive homozygotes. 

As a point of departure we assume that there are two alleles designated D and R. 
Their relative proportions are designated d and r (d + r = 1). We further assume 
that panmixia prevails for the genotypes in question. According to the Hardy- 
Weinberg law the proportions of the different genotypes are d? DD, 2dr DR and 
r RR, where d? + 2dr + r? = 1. Connected with these genotypes we assume two 
alternative phenotypes, non trait-carriers and trait-carriers. The trait is recessive 
and connected with the allele R. It is only partially manifested by the recessive 
homozygotes. 

The studied population (a number of families of the main population chosen at 
random) is divided into two generations: P-generation (the parents) and F-genera- 
tion (the children). Since we have to consider different manifestations at different 
age-levels and the recording instruments may be differently effective at different 
age-levels, the frequency of individuals recorded as trait-carriers in the P-generation 
is designated a, and the same proportion in the F-generation is designated b. 

The F-generation is divided into three sub-groups according to the recorded pheno- 
type of the parents. Group I contains children whose parents both have been re- 
corded as trait-carriers; group II children belonging to families where one of the 
parents has been recorded as a trait-carrier, while group III includes children of 
families where none of the parents has been recorded as a trait-carrier. 

According to the given definitions the proportions of the various genotypes in the 
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P-generation are (the recessive homozygotes manifesting the recessive trait have 
been given index , and the rest of the recessive homozygotes index 4): 


Recorded phenotype | Genotype Proportion 
DD 
DR 2dr 
r—a 


In family-type I the genotypes of the children are determined by the following 
type of mating: (RR,)(RR,). All individuals in the F-generation will therefore be of 
genotype (RR) and the proportion of recessive homozygotes among the descendants 
is consequently: 


In family-type II the progeny is determined by the combination 2(/DD + DR +RR,) 
(RR,). The porportion of recessive homozygotes among the descendants is conse- 
quently: 


2dra + 2a(r° — a) 
2a(1 — a) 


In family-type III the progeny is determined by the combination (DD + DR + 
RR,a)(DD + DR + RR,). The proportion of recessive homozygotes among the des- 
cendants is consequently: 


dr + 2dr(r° — a) + — a) (r — 
(1 — a)? (1 — a)?” 


With knowledge of the proportions of (RR) in the three subgroups of the F-genera- 
tion it is possible to formulate expressions for the recorded number of individuals 
who manifest the recessive trait within each group. The following designations are 
then used. The total number of children within the subgroups are designated N,, 
Ne, and N3, and the recorded number of children who manifest the recessive trait 
are designated Xj, X», and X;. The coefficient b/r? indicates the portion of recessive 
homozygotes in the F-generation recorded as trait-carriers under the prevailing pen- 
etrance conditions, since b is the fraction of recorded trait-carriers in the F-genera- 
tion, while r* is the proportion of all recessive homozygotes. We then get the follow- 
ing equations: 


Family-type I = (1) 
b r-a 
Family-type II Ne = (2) 
imily-type 


2 
a 
= 1. 
a2 
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a)” 
(1— a) 

In applying the formulae only r-values consistent with the original hypothesis are 
accepted. When the penetrance of the recessive homozygotes is at its maximum value 
e.g., when all recessive homozygotes are recorded as trait-carriers, a is equal to r°. 

This gives us the theoretical minimum value of r, namely rmin = a (if b is great- 
er than a, fmin = ~/b). Values of r which are smaller than the minimum value can- 
not be accepted. 

When we are in possession of a set of empirical data, the Mendelian hypothesis 
may be tested by X*-analyses of the deviations between actual values and the ex- 
pected amount of recorded trait-carriers in the sub-groups corresponding to different 
r-values. The analyses can be restricted to r-values around an approximately deter- 
mined value, which may be arrived at by inserting into the equations the known 
values of N, X, a and b and solving for r. The three determinations of r which are 
then obtained, will be approximately equal when the hypothesis is unrejectable. In 
this case we will always find a range of r, within which each r-value describes the 
variation of frequencies of trait-carriers in the sub-groups. The r value giving the 
smallest X* may be taken as an estimation of the proportion of R alleles in the popu- 
lation in question. This estimation is independent of the extent of the penetrance. 
The determination of the proportions of the alleles is therefore independent of the 
criteria used in recording the trait-carriers. 


Family-type III N; = X; (3) 


INHERITANCE OF HANDEDNESS 
1. Hypothesis 


The procedure will be exemplified through analyses of three studies of the inherit- 
ance of left-handedness. They have all been carried out in USA and are the only 
internationally known studies of left-handedness permitting an application of popu- 
lation genetics. They have been performed at different times and are distinguished 
by entirely different criteria of left-handedness. Much varying frequencies of left- 
handedness have therefore been obtained. The analyses show, however, that the 
three populations have the same genetic constitution. 

If as a point of departure we assume that right-handedness is conditioned by the 
dominant gene in a pair of alleles, it will hold true that left-handedness appears only 
in the absence of the dominant gene. Because of the social pressure promoting right- 
handed behavior, we have to calculate with the possibility that a portion of the re- 
cessive homozygotes so strongly suppress the recessive behavior tendencies that it 
is impossible to distinguish them from right-handers being right-handed due to the 
dominant gene. The more incomplete the recording of left-handed tendencies is, the 
smaller is the fraction of recessive homozygotes recorded as trait-carriers. It may 
as well be assumed that the absence of the dominant gene activates other genes 
which control the strength of the left-handed tendencies. It is also possible that the 
weaker forms of left-handedness cannot be distinguished from the right-handed 
behavior determined by the action of the dominant gene. 
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2. Rife’s material 

A wellknown investigation of the inheritance of left-handedness is made by D. C. 
Rife (1940). It is made up of two parts: one twin-study implying the presence of ge- 
netically determined differences in degrees of left-handed tendencies, and a popula- 
tion-genetic material including 687 families collected by Rife among the students at 
Ohio State University. On the basis of this latter material Rife concluded that “‘left- 
handers are more likely to have left-handed children than are the right-handers’’. 
The criterion of left-handedness used by Rife was 10 selected acts known to be car- 
ried out by most people with their right hand. Individuals indicating the use of their 
left hand in one or more of these 10 acts were recorded as left-handed. The P-gener- 
ation contained 1374 individuals, 72 of which were recorded as left-handed (a = 
0.0524). 

The F-generation contained 2178 individuals, 191 of which were recorded as left- 
handed (b = 0.0877). 

Family-type I (both parents left-handed) contained 11 children, 6 of which were 
recorded as left-handers. An approximate r-value is determined by the equation 
cal - 11 = 6, which gives r = 0.401. 

Family-type II (one parent left-handed) contained 174 children, 34 of which were 
recorded as left-handers. An approximate r-value is determined by the equation 
- 174 = 34, which gives r = 0.414. 

Family-type III (both parents right-handed) contained 1993 children, 151 of 
which were recorded as left-handers. An approximate r-value is determined by the 
0.0877 (r — 0.0524)? 

0.9476" 

The three determinations of r correspond well. The X*-analyses show that r-values 
within the range of 0.51-0.35 are acceptable on the 5 per cent level, but values out- 
side these limits make the hypothesis unacceptable. The smallest X*-sum is obtained, 
when r equals 0.410, a value which may be looked upon as an estimation of the pro- 
portion of R in the population, from which Rife’s material was collected. The devia- 
tions between the actual values and the values corresponding to various r-hypotheses 
are found in Table 1. 


- 1993 = 151, which gives r = 0.439, 


3. Chamberlain’s material 


In the year of 1927 an investigation of the inheritance of left-handedness was 
performed by Chamberlain (1928). The material included 2177 families collected at 
the Ohio State University. The criterion of left-handedness was the writing hand. 
Besides the families collected at random, there are some families of type I in his 
report with which he came into contact through the newspapers. These families 
were added by Chamberlain to the material and used in his analysis, but thanks to 
his detailed report it is possible to exclude them from the calculations, which is nec- 
essary for a correct population-genetic analysis. Lacking the means to perform such 
an analysis, Chamberlain could only draw the conclusion, that handedness must be 
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TABLE 1. A COMPARISON BETWEEN VALUES CORRESPONDING TO DIFFERENT R-HYPOTHESES AND 
ACTUAL FREQUENCIES IN RIFE’S MATERIAL 


(Actual va.) | 6 5 34 140 | 151 1842 

r= 0.51 | Hr 7.29 28.33 145.67 | 156.71 1836.29 3.714 
r = 0.50 | 3.86 7.14 28 .83 145.17 | 155.99 1837.01 | 3.243 
r = 0,48 | 4.19 6.81 29.89 144.11 | 154.47 1838 .53 2.030 
r = 0.46 | 4.56 6.44 31.02 142.98 | 152.83 1840.17 1.149 
r= 0.44 | 4,98 6.02 | 32.24 141.76 151.05 1841.95 0.490 
r= 0.41 5.74 5.26 | 34.26 139.74 148.08 1844.92 0.089 
r = 0.38 6.68 4.32 | 36.53 137.47 | 144.67 1848 .33 0.697 
r = 0.36 7.44 3.56 | 38.22 135.78 | 142.11 1850.89 2.057 
r = 0.35 | 7.88 5.22 | i 134.88 140.73 1852.27 3.252 


* The number of degrees of freedom is one, as the information derived from the empirical data 
consists of five parameters. 


hereditary. The frequencies found did not seem to correspond to any known Men- 
delian formula. 

The P-generation contained 4354 individuals, 155 of which used their left hand in 
writing, and therefore recorded as left-handers (a = 0.0356). 

The F-generation contained 7714 individuals, 367 of which were recorded as left- 
handers (b = 0.0476). 

Family-type I (both parents left-handed) contained 25 children (Ni = 25), 7 of 
which were recorded as left-handers (X; = 7). Equation (1) gives an r-value of 0.412. 

Family-type II (one parent left-handed) contained 464 children (Np = 464), 53 
of which were recorded as left-handers (X_, = 53). Equation (2) gives an r-value of 
0.393. 

Family-type III (both parents right-handed) contained 7225 children (N; = 7225), 
307 of which were recorded as left-handers (X; = 307). Equation (3) gives an r- 
value of 0.401. 

The three r-determinations correspond as well in this material as in Rife’s. The 
X*-analyses show that r-values within the range of 0.49-0.33 are acceptable on the 
5 per cent level, but values outside these limits make the hypothesis unacceptable. 
The smallest X?-sum is obtained, when r equals 0.402, which may be looked upon 
as an estimation of R in the population from which Chamberlain’s material was col- 
lected. 


4. Ramaley’s material 


The oldest study dealt with in this context was performed by Francis Ramaley 
(1913). The material was collected at the University of Colorado, covering 305 fam- 
ilies. There is no information in the report about the criterion used. On the basis of 
his figures Ramaley concluded that left-handedness is inherited as a Mendelian re- 
cessive, the frequencies of the genotypes assumed to be of the proportions 9:12:4 
(the most probable constant proportion that he found among several enumerated, 
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containing, small, whole numbers). The reason behind this choice, giving 16 per cent 
recessive homozygotes, was the fact that he had determined the frequency of left- 
handedness in the F-generation to be 15.66 per cent. The but half as large percentage 
of left-handedness in the P-generation was explained to be caused by the parents 
having concealed their left-handedness to a great extent. The occurrence of a right- 
handed child in family-type I was thought to contradict the Mendelian hypothesis 
and could not be explained by Ramaley. 

The P-generation contained 610 individuals, 49 of which were recorded as left- 
handers (a = 0.0803). 

The F-generation contained 1130 individuals, 177 of which were recorded as left- 
handers (b = 0.1566). 

Family-type I (both parents left-handed) contained 7 children (Ni = 7), 6 of 
which were recorded as left-handers (X; = 6). Equation (1) gives an r-value of 0.427. 

Family-type II (one parent left-handed) contained 170 children (Nz = 170), 55 
of which were recorded as left-handers (X. = 55). Equation (2) gives an r-value of 
0.427. 

Family-type III (both parents right-handed) contained 953 children (N3 = 953), 
116 of which were recorded as left-handers (X; = 116). Equation (3) gives an r- 
value of 0.425. 

The three r-determinations give practically the same values. The X*-analyses show 
that r-values within the range of 0.51-0.40 are acceptable on the 5 per cent level, 
but values outside these limits make the hypothesis unacceptable. The smallest X*- 
sum is obtained, when r equals 0.427, which may be looked upon as an estimation 
of R in the population from which Ramaley’s material was collected. 


5. Discussion 


In all these three population-genetic analyses of the inheritance of handedness the 
proposed model describes the empirical data completely. It thus seems probable 
that a dominant-recessive pair of alleles regulates the manifestation of handedness, 
the dominant gene conditioning right-handed behavior while left-handed behavior 
is found in the absence of this gene. According to the model the environment and 
eventual other genes cause a reduction of the manifestation of the recessive allele e.g. 
some recessive homozygotes express a behavior pattern which cannot be distin- 
guished from the behavior of those individuals whose manual habits are determined 
by the dominant gene. Another possibility consistent with the model is given by the 
assumption that the right-handed behavior is determined by the total genotype, 
while left-handedness is caused by a diallelic gene, one allel of which has no influence 
of the total genotype, while the other as a double has a modifying effect which is 
counteracted by the environment and in consequence of the incomplete registering- 
methods not always observable. The well-known differences in strength of the left- 
handed tendencies may even in this case be explained by the influence exerted by 
other genes.' 


! Dahlberg (1926) and others have found that identical twins frequently are discordant when it 
comes to handedness. Since identical twins must be assumed to have the same constitution, the dis- 
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TABLE 2. THE INHERITANCE OF HANDEDNESS IN THREE AMERICAN INVESTIGATIONS 


Ramaley 1913 | Chamberlain 1928 | Rife 1940 
Parental Generation | 
Total 610 4354 | 1374 
Left-handers 49 155 72 
a | 0.0803 | 0.0356 0.0524 
Filial Generation | | | 
Total | 1130 | 7714 2178 
Left-handers | 177 367 | 191 
b | 0.1566 0.0476 | 0.0877 
— | | 
Type I (left x left) | | | 
Ni | 7 25 | 11 
Xi | 6 | 7 | 6 
TY | 0.427 | 0.412 0.401 
| | 
| | 
Type II (left x right) | | 
N: 170 | 464 | 174 
X2 | 55 | 53 | 34 
0.427 | 0.393 | 0.414 
Type III (right x right) | | 
N3 953 | 7225 | 1993 
X;3 116 | 307 151 
rs 0.425 | 0.401 | 0.439 
Acceptance-limits 0.40-0.51 | 0.33-0.49 | 0.35-0.51 
X?-sum minimum | 0.001 | 0.050 0.089 
Estimation of R 0.427 0.402 | 0.410 


The frequency of the R-gene for left-handedness lies between 40 and 43 per cent 
in all three studies. The mean of the established values is 41.3 per cent. The per- 
centages of the various genotypes in the population of the United States correspond- 
ing to this value are DD: 34.5 per cent, DR: 48.5 per cent and RR: 17 per cent. 


cordance has been taken to prove that hand preference could not be hereditary, at least not deter- 
mined by the usual gene action. Dahlberg has assumed that genotypical asymmetries which are fairly 
common and to which he for the present wants to refer handedness, arise independently of the usual 
actions of the genes. In line with this opinion the discordance of the twins is an expression of such a 
mechanism. The twins are then considered each to be one half of the original, undivided embryo, 
divided into two asymmetrical parts by the twinning mechanism. The concordance among identical 
twins may be assumed to be due to the early timing of the parting of the embryo. That a mechanism 
of the type mentioned can be effective in twinning cannot be rejected. There are, however, no experi- 
mental or empirical findings indicating that the same or a similar mechanism could be the cause of 
the side-dominance in single-born children. For the present, it seems most correct to accept the pos- 
sibility that handedness of identical twins can be changed by the twinning-mechanism. Deviations 
from expected Mendelian values in the population caused by such a mechanism are, however, most 
likely too small to be noticed in a population genetic analysis. 


; ‘ 
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APPLICATION TO SCHIZOPHRENIA 


The hereditary background of schizophrenia has been dealt with by Strémgren 
(1938), among ‘others. He presented a hypothesis concerning its inheritance in his 
dissertation, which corresponds almost exactly to the theory proposed for left-hand- 
edness. Strémgren is of the opinion that predisposition for schizophrenia depends 
on the presence of a recessive gene but that the disorder appears only in a fraction 
of those who are homozygous for this gene. The incomplete manifestation is partly 
the effect of the environment and partly the effect of subordinate hereditary factors 
determining the degree of disposition for the disorder or the resistance to it. Strém- 
gren’s material is composed of disordered individuals and their families, thus giving 
no information of the frequency of schizophrenics within the main population to 
which these families belong. In order to be able to test the hypothesis, Strémgren 
is forced to make some additional assumptions regarding the probability of a reces- 
sive homozygote to become ill. This probability is designated m (Manifestations- 
wahrscheinlichkeit) while the proportion of individuals becoming ill within a specific 
group is designated KE (Krankheitserwartung). With the support of these designa- 
tions and the definition m + n = 1, Strémgren indicates how the frequency of ill 
individuals in the F-generation is regulated for different types of mating. 

For an average population he gives the following formula: 


KE=mr (4) 
When one of the parents is ill: 
dr + nr 
= ) 
d? + 2dr + nr’ 


When none of the parents is ill: 


KE =m + 2ndr + nr (6) 
4d? + 4ndr + 

Since KE is a function of the two unknown values m and r in the empirical ma- 
terial (n and d may be expressed in these values) these formulae can not be used 
for an experimental test of the hypothesis of inheritance. Strémgren resorts there- 
fore to using hypothetical values of m (“‘schitzungsweise veranschlagen”). Through 
inserting an arbitrary value of m into the equation for an average population, the 
KE of which he estimates to be 0.0075, he obtains a value of r (also arbitrarily). 
This r-value and other suitable m-values are then inserted into the equations (5) 
and (6), whereby he obtains an approximate correspondence to the frequency of 
illness in his own material. 

This correspondence is not necessarily caused by the genetic constitution of his 
material since we do not know whether the inserted values of m and r really are valid 
in the material in question. 

If we turn from the proband method to a population genetic method this is clari- 
fied further.? If we choose to use families selected at random from the main popula- 


? The connections between Stréngren’s equations and the general formulae are the following: In 
Strémgren’s version (1) refers to the entire population. If we put m = a/r’, KE is found to equal a 
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tion instead of ill individuals’ families, m may be expressed directly in the relation 
between values obtainable from the material and r*. The number of unknowns in the 
geno-statistical equations are thereby reduced to one, and the hypothesis may be 
tested according to the description. 

A statistically valid test of Strémgren’s hypothesis thus demands a material col- 
lected at random. If the expected acknowledgment is obtained the existence of a 
dominant recessive factor-pair may be established. The most important step for the 
establishment of the inheritance is thereby taken, although the differentiation of the 
recessive homozygotes remains to be clarified. Strémgren writes: “Mit einem Nach- 
weis, dass ein rezessives Genpaar in der Schizofreniegenese obligat ist, ist die Frage 
des Erbganges ja nicht erschépft; aber die iibrigen in Betracht kommenden Gene 
werden doch von sekundirer Bedeutung sein, da sie fiir die Manifestation der Krank- 
heit fordernd (oder hemmend) sein kénnen, fiir sie aber nicht obligat (bzw, unver- 
einbar mit ihr) sind.” 


A HYPOTHESIS FOR DYSLEXIA 


The most outstanding technical characteristic of a psychological genetic based on 
population-statistics is the consideration taken to varying manifestations of the geno- 
types (caused by environmental and measurement-technical influences). In order to 
test a hypothesis regarding the inheritance of a certain trait, we need formulae that 
include the influence of these factors. The hypothesis dealt with hitherto is clearly 
not the only one for which such formulae can be constructed. 

In his study of dyslexia Hallgren (1950) has shown that the inheritance of the 
typical reading and writing difficulties most likely is autosomal, dominant mono- 
factorial in character. Hallgren’s investigation is based on proband methods which 


in the P-generation. If we put m = b/r’, KE is found to equal b in the F-generation. In the popula- 
tion-genetic theory these are the definitions of a and b. In Strémgren’s version (2) is identical to the 
general formula for family-type II in the population-genetic theory. Since KE = X:/N2, and m = 
a/r® in the P-generation and b/r? in the F-generation, Strémgren’s equation may be written 


b dr + (1 — a/r?)r? b r-a 
rd? + 2dr+ (1 —a/r?)r? l—a 


X2/N2 = 


In Strémgren’s version (3) is not identical to the formula for family-type ITI in the population-genetic 
theory. It is true that it refers to the case when both parents are normal, but it presumes that the 
parents among their descendants has an ill individual, excluded from the material (the proband). 
This limits the genotypes of the parents to heterozygotes and normal recessive homozygotes, while 
the general formula also includes parents who are dominant homozygotes. The latter’s contribution 
to the descendants of this family-type (d?) will be recognized when the equation is evolved by substi- 
tuting m as done previously. Formula (3) may thus be written 


KE = b d+ 2(1 — a/r*)dr + (1 — a/r?)? r? _b (d+ (1 — a/r*)r)? 
4d? + 4(1 — + (1 — aft)??? (2d + (1 — 


b 
rm (2d+r—a/r)? rt (Qd+r—a/r)? 


(2dr + — a)? 


¢ 
| 
b (r — a)? 
r (1 —a — d?)? 
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do not give complete information of the genetic constitution of the population. A 
doubt may be raised concerning Hallgren’s way of assuming equality between the 
number of diagnosed cases of dyslexia and the number of corresponding genotypes, 
because there are good reasons to believe that some cases of dyslexia escape record- 
ing due to imperfect recording methods and because of environmental influences (the 
school and its education) working against the manifestation of the gene. 

Hallgren’s hypothesis may thus be formulated in a population-genetic way open- 
ing it to an experimental testing of the same type as the hypothesis of left-handed- 
ness and schizophrenia. This hypothesis may primarily be formulated in the following 
manner. 

Dyslexia is inherited as an autosomal, monofactorial dominant. It is always mani- 
fested when the dominant gene is present in a double set. We must take into account 
that some of the heterozygotes will be recorded as normals in experimental work 
because of the environmental pressure working toward normal behavior, and because 
of incomplete recording. Heterozygotes will therefore be found both among those 
recorded as word-blind and those recorded as normals. Recessive homozygotes on 
the other hand, display only normal behavior. The analysis is carried out with the 
concepts used in the preceeding discussions. The proportion of trait-carriers in the 
P-generation is designated a, and in the F-generation, b. The proportions of the 
various genotypes in the P-generation will, according to the definitions, amount to 


Recorded phenotype | Genotype | Proportion 


| | 2 

| d 

| 

| 


The descendants in the different family-types are determined by the following 
combinations: 


Type I (trait-carrier)(trait-carrier) (DD+DR,)(DD+ DRa) 
Type II (trait-carrier)(normal) 2(DD+DR,)(DR,+RR) 
Type III (normal)(normal) = (DR,+RR)(DR,+RR) 


The proportion of DD and DR in the three sub-groups of the F-generation are 
the following (for the fasciliation of writing the formulae, c is used as a substitute 
of 1 — a): 

Family-type I 
The proportion of DD is 
d‘ + — +  (a+dy 
4a’ 


The proportion of DR is 
— d’)d* + — + _ 


a? 2a? 


| 
I 

| 
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Family-type II 
The proportion of DD is 


(c — + Ha — d’)(c — r’) _ + d’) 
2ac 4ac 


The proportion of DR is 


(c—r)d + + _ ac+ dr 
2ac 2ac 


Family-type III 
The proportion of DD is 
4c 
The proportion of DR is 


2) 2 2 2 2 4 
Ke — _c 


ac" 

With the knowledge of the proportions of the genotypes which give rise to trait- 
carriers in the F-generation, it is possible to formulate generalized expressions for 
the number of recorded trait-carriers in the three sub-groups of the F-generation. 
Ni, Ne, and N3, designate the number of individuals in these groups, and Xi, Xo, 
and X;, the number of recorded trait-carriers. All dominant homozygotes are re- 


9 


is em- 


corded as trait-carriers, according to the hypothesis. The coefficient 


ployed to indicate the fraction of heterozygotes which are recorded as trait-carriers. 
The three equations are: 
Family-type I: 


(a+d) ,b-—d a— 


Family-type IT: 


2 2 2 2.2 


4c 2dr 2ac 


Family-type III: 


( 4ac 2dr (9) 


Only d-values consistent with the original hypothesis may be accepted when put- 
ting these formulae to practical usage. When all heterozygotes are recorded as trait- 
carriers, a equals d? + 2dr, and when all heterozygotes are recorded as normals, a 


b — d? 

2dr 

| 
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equals d*. On the basis of these expressions the theoretical limits for d (and r) can 
be determined to equal: 


dues Va, dmin =1- Ve 


When analysing an empirical material with the population genetic method already 
the determinations of the proportions of trait-carriers in the P- and F-generations 
give the limits within which the proportions of the two alleles may vary if the hy- 
pothesis is not to be rejected. 

A collection of a material by which this modification of Hallgren’s hypothesis 
could be tested, should not meet with any serious obstacles. It is not unlikely, how- 
ever, that some other modification of the hypothesis, eventually may prove to be 
more correct. For instance, it is not impossible that the dominant homozygotes and 
heterozygotes are both receptive to the influences which reduce the mainfestation of 
the gene. The measuring instrument would then be likely to fail in recording both 
homo- and heterozygotes. If this is the case we must take into account that the 
genotype DD is represented also among those recorded as normals. The formulae 
will in that case be still more complicated. 
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Sequential Tests for the Detection of Linkage’ 
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University of Wisconsin 


INFORMATION ON LINKAGE in man is accumulated as a succession of samples, each 
of which is typically small relative to the amount of data required to detect even 
moderately close linkage. The best method of analysis for such sequential samples, 
in the sense of requiring the least number of observations consistent with a given risk 
of error, has been found to be a sequential probability ratio test (Wald, 1947). 
It will now be shown that this test, in addition to minimizing the number of observa- 
tions, is in other respects a useful method for the detection of linkage in man. 


1. THE ASSUMPTIONS 


Consider two gene loci, G and T, not necessarily on the same chromosome. An 
individual of genotype GG’ TT’ may be of either of two possible phases, GT/G’T’ 
or G’T/GT’, corresponding to his formation by the union of GT and G’T’ gametes, 
or of G’T and GT’ gametes. If the G and T loci happen to be on the same chromosome, 
these two phases correspond to the usual meanings of coupling and repulsion. In any 


case, the frequencies of the four types of gametes produced by this individual, if he 
is GT/G’T’, will be 


4(1— 6)GT, 4(1—6)G’T’, 40GT’, 3 0G’T, 
whereas, if he is G’T/GT’, they will be 
36GT, 306G’T’, 4(1-—96)GT’, 3 (1 — @#)G’T, 


where @ is the probability of recombination between the two loci (0 < @ < 1; nearly 
always, 8 < 1/2). 

Now, a sufficient set of assumptions for a “linkage” test is the following: 

1. The parental genotypes are known with certainty, except for phase. 

2. The segregation ratios are not disturbed by incomplete penetrance or differ- 
ential viability. 

3. The method of ascertainment and selection of families is properly allowed for. 
With this postulational basis, the null hypothesis to be tested is that “the three 
assumptions are correct and the recombination fraction in the population equals 
1/2”. Some of the alternative hypotheses are: 

. Incomplete penetrance or differential viability. 

. Biased ascertainment or selection of families. 

. Nonrandom segregation of nonhomologous chromosomes. 

4. Co-existence of the two loci on the same chromosome (linkage). 


Received May 28, 1955. 
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Although a distinction between nonrandom chromosome segregation and linkage 
(which is presumably much the commoner of the two phenomena) will not be possible 
until the human linkage groups are better known, it should not be difficult to recog- 
nize the other disturbing factors in data that have been carefully collected and 
reported. 

The above assumptions are rather stringent and must be examined in detail. 
Cases to be treated in this paper include incomplete ascertainment, uncertain 
parental genotypes, and incomplete penetrance. 

No attempt will be made to treat “linkage” tests in which the basis of either 
character is not a single Mendelian factor. If the basis of one or both conditions is 
multifactorial or unknown, “linkage”’ is at best ambiguous and generally cannot be 
distinguished from any other phenotypic correlation which varies among families. 
The exploration of these complicated situations may be of some interest, but to 
include such characters on fancied “linkage’’ maps, as some authors have done, is 
to depreciate the linkage maps that have been determined with some precision in 
other organisms. 

Since even the most conservative set of assumptions confounds linkage with 
other phenomena, the burden of proof is on the investigator who asserts that a 
particular example of linkage-like effects is evidence of true linkage. When two genes 
satisfy regular Mendelian ratios, however, it is convenient to denote such effects as 
linkage, with the assurance that this designation is rather precise, and that its 
precision will increase as the human linkage map is developed. 


2. CURRENT TEST PROCEDURES 


The three methods most commonly used to detect human linkage are the method 
of efficient scores (u scores), the Penrose sib-pair method, and the probability ratio 
method of Haldane and Smith (1947). Smith (1953) has recently shown that they 
are all really different forms of the nonsequential probability ratio test. 

Valid scoring procedures were first applied to human linkage by Bernstein (1931), 
who showed that each family can be assigned a score whose sum, expected value, 
and variance provide a test of the null hypothesis in any body of data that is suffi- 
ciently large for the distribution of the total score to be nearly normal. Bernstein’s 
scores were further developed by Hogben (1934) and Haldane (1934), but the 
evolution by Fisher of a maximum likelihood scoring procedure made these methods 
obsolete. Fisher (1935) was able to show that his u scores are more efficient than 
Bernstein’s scores for all linkage intensities and are, in fact, fully efficient in the 
limit for loose linkage. Finney (1940 et seq.) has treated a great variety of cases by 
u scores, which are now commonly considered to be the method of choice whenever 
the amount of data is large and the families are not grouped into large pedigrees. 
However, u scores have certain disadvantages, some of which Smith (1953) has 
summarized as follows: 

1. Although u scores are very easy to use when the parental genotype is com- 
pletely known (except for phase), the calculation of the variance may be intractable 
when the parental genotypes are unknown. In large samples this can be circum- 
vented by the use of a simple approximation (Smith, 1953). 
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2. The u scores are fully efficient only in the limit for loose linkage, which it is 
not practicable to detect. An ideal test would be efficient for moderate rather than 
loose linkage. 

3. Information about linkage can be greatly increased by using data involving 3 
or more generations. It is not feasible to extract this information by u scores. 

4. The assumption of normality for the total score may be far from true for 
moderate sample sizes. Haldane (1946) has developed a normalizing transformation 
for such cases, and shown that in one instance an exact test fails to confirm the 
significance of a u score test. 

The sib-pair method of Penrose has sometimes been recommended as an alter- 
native to u scores when the parental genotypes are unknown. The investigations of 
Finney (1942) do not support this recommendation, since in his data the sib-pair 
method extracted only a small fraction of the information that could be obtained 
by u scores. However, when one of the test characters is a rare recessive trait, the 
sib-pair method fares somewhat better (Penrose, 1953). A serious disadvantage of 
the method is that it may be quite inexact when, as the current procedure requires, 
a family of size s > 2 is partitioned into all s(s — 1)/2 possible pairs (Penrose, 
1953; Smith, 1953). Smith (1953) has shown how a large-sample correction for non- 
independence of sib pairs may be applied, but its use destroys the principal ad- 
vantage of the method, that of arithmetical simplicity. Finney (1941a) has pointed 
out that the Penrose sib-pair method is particularly sensitive to heterogeneity in 
gene frequencies when different populations are pooled. The sib-pair method can be 
applied to traits whose mode of inheritance is unknown, but then the term “linkage” 
is scarcely appropriate. 

The probability ratio test of Haldane and Smith (1947) was devised to extract 
information from families and pedigrees without making the assumption of nor- 
mality that is required by the maximum likelihood method. Their test depends on 
the theorem that the expected value of a probability ratio is 1 on the null hypothesis, 
regardless of the alternative hypothesis (Wald, 1947). Since this is true for any simple 
hypothesis, it must be true for any composite hypothesis, which is merely a weighted 
average of simple hypotheses such that the sum of the weights is 1. Let A be a proba- 
bility ratio for the test of the null hypothesis that @ = 1/2 against some alternative 
hypothesis. Then, on the null hypothesis, the inequality 


A>A, (A>1) 


cannot occur with probability greater than 1/A, since if it did, this in itself would 
be enough to raise the mean value E(A) to 1, and therefore the occurrence of a value 
of A greater than A is at least as strong evidence against the null hypothesis as a 
significance level of 1/A. Clearly this method of analysis has several advantages, 
among them its reliability in small as well as large samples, its dependence solely on 
elementary laws of probability, and the ease with which all kinds of families and 
pedigrees may be combined. However, the method is conservative, and a recent 
modification (average backward odds) is less efficient (Smith, 1953). 

The three common methods of linkage detection in man do not exhaust the pro- 
cedures that have been proposed, but of the current tests, the u statistics of Fisher 
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and Finney and the probability ratio method of Haldane and Smith are the best 
alternatives to sequential tests. 


3. SEQUENTIAL TEST PROCEDURES 


Let f(y; 6) denote the distribution of a random variable y, where @ is the recom- 
bination fraction and successive observations on y are indicated by yi, y2, ..., 
etc. The observation y = 1 signifies that f(y; 6) is of the form f(1; 6), and so on. 
For example, double backcross families of size 2 have two possible forms of the 
function f(y; @), which may arbitrarily be specified by y = 1 and y = 2. Under the 
conditions of Section 8 below, 


4) = + (1 — 6)? 


6) = 20(1 — 


Thus, a particular sample of 3 independent sib pairs might be yi, y2, ys = 2, 1, 2, 
and the probability of this sample is f(2; @)f(1; @)f(2; @). 

Let Ho be the null hypothesis that @ = 1/2 and H, be the alternative hypothesis 
that 6 = 6,. The probability that a sample yi, y2, --- , Ym is obtained is given by 


Pim = f(y1 ; f(¥m ; 1) 
when H, is true, and by 
Pom = f(y: ; 1/2) «++ f(ym ; 1/2) 


when Hp is true. The sequential test (Wald, 1947) employs the probability ratio 
Pim/Pom and two positive numbers A and B, with A > 1 and B < 1. For purposes 
of practical computation it is much more convenient to work with the logarithm of 
this ratio rather than the ratio itself, since 

Pim _ 0) 5 1) 


Let z; denote the i“ term in this sum, viz., 


zi = log 5 6) 


51/2) 


The test procedure is carried out as follows, the quantities z; (1 = 1, 2, ---) being 
used: with each accession of data (consisting of one or more families or pedigrees), 
the cumulative sum z; + --- + Z, is computed. If 


log B < a+ + zm < log A 


the evidence on linkage is not decisive, and judgment with the preassigned sig- 
nificance level and power must be suspended until more data can be collected. If 


a+ +2m> logA 
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there is significant evidence for linkage under the assumptions of the test. If 
a+ Den < log B 


the recombination fraction is significantly greater than @ . 

More data can always be used following a sequential test, either to estimate a 
significant linkage or to detect or exclude linkage in the range #, < @ < 1/2, but this 
latter enterprise may be unprofitable if a stringent choice was made for @, . 

The constants A and B are related to a, the probability of rejecting Ho when 
Hy is true (a Type I error), and 8, the probability of rejecting H; when H; is true 
(a Type II error). In practice, two simple approximations are used to determine 
A and B: 


Wald (1947) has shown that these approximations cannot result in any appreciable 
increase in the value of either a or 8, and that they may be used to obtain expres- 
sions for the power function P(@) and the average sample number function E(n) 
of a sequential test. These two functions determine the best sequential test for a 
particular purpose and the extent of its superiority over nonsequential procedures. 
Requirements to impose on these functions are suggested by the probability dis- 
tribution of 6. 


4. THE PROBABILITY DISTRIBUTION OF THE RECOMBINATION FRACTION 6 


Haldane and Smith (1947) have suggested “chiefly from a comparison with the 
known linkage values of Drosophila” that it may not be a bad approximation to 
assume that the recombination fraction for linked genes has a uniform distribution 
from 0 to 1/2. The distribution may also be arrived at more pedantically. 

Consider a chromosome with genetic map length of L morgans, along which 
gene loci are distributed uniformly. We need not assume that the genes are dis- 
tributed uniformly along the physical chromosome, only that their locations on 
the linkage map are so distributed. Choose two loci at random with locations C, 
and C2, where C; is the first locus chosen. The quantity w = | Ci: — C2| is called 
the map distance between the two loci (0 < w < L). The cumulative density func- 
tion of w may be represented on (C/L, C2/L) coordinates by the area within a 
unit square between the lines w = C. — C; and w = C; — (2, or 


2 2 2Lw —w 

F(w) = = 
Kosambi (1944) has shown that the map distance w is related to the recombination 
fraction @ as 
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assuming that the coincidence is 26. By this approximation 


1+ 20 1+ 20\* 
log {tog 


16L? 


and the probability distribution of 6 for linked genes, gotten by differentiating 
F(@), is 


oe 1 
— 4) 


< 1/2 


= 0 elsewhere. 


The critical point 6’ beyond which f(@) = 0 is determined by the equation 


1 + 26’ 

L = } log 55 = tanh 20 
—4L 
1 + e 


We may verify that f(@) is a density function over the interval 0 to 6’; 


since 


4L. 


log 


L =50 Morgons 
L=1.00 


Fic. 1. The distribution of the recombination fraction @ for chromosomes of length L 
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TABLE 1.—THE DISTRIBUTION OF GENETIC MAP LENGTHS (L) IN DIFFERENT ORGANISMS 


Source L = .25; L= 50) L = .75|L = 1.00/L = 1.25! L = 1.50) L = 2.00| L = 2.50, L = 3.00 L?/ 

n(L)? 

Drosophila! — — 1 2 — | — — | .345 

Corn (Zea)* | 1 1 3 2]; 2] 1 — | 197 

Mouse Be 15 — = | = 46 13 3 2 | .058 
! Linkage map, neglecting the dot-like IVth chromosome, Liy = .002 (Bridges and Brehme, 


1944). 

? Linkage map (Rhoades, 1950). 
chiasma frequency , . 
(Crew 


’ Based on chiasma frequency in random chromosomes, assuming L = 


and Koller, 1932). 
Recent data (Carter, 1955) suggest that the average value of L in the mouse is nearer to unity 


than here indicated, hence the distribution g(@) in Figure 2 should presumably be even closer to 
uniformity. 


Figure 1 shows f(@) corresponding to different values of L. For chromosomes of 
length near unity (100 centimorgans) the distribution of @ is almost uniform. In 
fact, the recombination fraction has an exactly uniform distribution for chromosomes 
of unit genetic length according to the simple mapping function 6 = w — 3 w? 
(0 < w < 1), for since F(w) = 2w — w’, the distribution of @ is 


= 20, 0<6<1/2 
{(6) = 2. 


Actually chromosomes of unit length are nearly modal in the few higher organisms 
whose genetic maps are known. Table 1 gives the distribution of L for Drosophila, 
corn, and (very approximately) for the mouse. On the assumption of a uniform 
density of loci on the chromosome map, the probability distribution of the recombina- 
tion fraction between two randomly chosen loci is 


DL 
OCG) 


Figure 2 shows that in all three species g(6) is closely approximated by a uniform 
distribution, and that the greatest departure from this approximation is for values 
of @ close to 1/2, which in practice could seldom be distinguished from independent 
assortment. The distribution g(@) is probably much the same in man, where the 
average genetic length, based on mean chiasma frequency, may be close to unity 
(Schultz, unpublished; cited by Neel, 1949). 

Table 1 may also be used to compute the probability ¢ that two randomly chosen 
loci be on the same chromosome. If the number of loci per chromosome is propor- 
tional to L, 
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Drosophila ‘ 
Corn (Zea) 


— —— Mouse (based on chiasma frequencys! 


Fic. 2. The distribution of the recombination fraction @ for linked genes in three different species 


where n is the haploid number of chromosomes. If all chromosomes are of equal 
length, @ = 1/n, and for the organisms tabulated this turns out to be a good ap- 
proximation. In Drosophila, neglecting the dot-like [Vth chromosome, n = 3, 
@ = .345; in corn, n = 10, @ = .117; in the mouse, n = 20, @ = .058, or d = .064 
if pachytene length is proportional to L (Slizynski, 1949). In man, with 23 autosomes, 
the frequency of autosomal linkage may reasonably be taken as ¢ = .05, so that the 
distribution of recombination values in man may be approximated as follows: 


= .10 0<6< 1/2 


= 1/2 


0 elsewhere. 


5. THE CHOICE OF A SEQUENTIAL TEST 


The validity of a sequential test does not depend on the accuracy of these ap- 
proximations, but they do suggest criteria by which a suitable sequential test may 
be selected. We are especially anxious to avoid the assertion that two genes are 
linked when in fact they are not, since a misleading linkage map is worse than no 
linkage map at all. One source of linkage-like effects can be nearly eliminated by 
considering only pairs of loci which satisfy our assumption that the expected segrega- 
tion ratios for both loci are realized in the population sampled. However, cases of 
apparent linkage will still be made up in part of true linkages, in part of Type I 
errors. If the prior probability of linkage is ¢ = .05, then the posterior probability 
that a case of apparent linkage be a Type I error is 
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where P is the average power of the test, or the probability of detecting linkage 
when it is present. R. S. Krooth (personal communication) has termed p the relia- 
bility and P the sensitivity of a linkage test. Calculations of p for different values of 
a and P show that the usual values of a are inadequate in this problem, and that 
for the posterior probability of a Type I error to be less than .05, a must be about 
002 when P = .95, .001 when P = .60 and .0005 when P = .20 (cp. Haldane, 1934). 

Having placed the requirement on a that it be small enough to reduce the pos- 
terior probability of a Type I error to .05, we impose a second condition on the 
power function of the test. To be at all useful, the test must have a power close to 
unity for values of 6 near zero. We are at liberty to choose 6; , the formal alternative 
to 6 = 1/2, as near to 1/2 as we please, and the only adverse effect of this choice 
is to increase the average sample number. On this reasoning it seems appropriate to 
let 6; take the largest value which is likely to give a significant result in a practicably 
large body of data, and to consider the average sample number function a basis for 
the selection of a sequential test. 

As an application of this argument, consider four sequential test procedures 
defined by the relations 


(1) 4=.05, A=200, B=01, %=1/2 
(2) 4=.10, A=100, B=.01, %=1/2 
(3) .20, A=1000, B=.01, 6 =1/2 
(4) 30, A=100, B=.01, %=1/2 


and assume that the data consist entirely of double backcross sibships of size 2, 
sampled under the conditions of §8 below. The probability can take only the value 
f(1;6) = @& + (1 — 6)*, corresponding to a sib pair that is either concordant in both 
traits or discordant in both, and f(2; @) = 260(1 — 6), which corresponds to a sib 
pair that is concordant in one trait and discordant in the other. Following Wald 
(1947) and assuming that the excess over the boundaries at the termination of the 
test can be neglected, we obtain a good approximation to the power function P(@) 
by solving two equations for various values of h 


£05.) _ 


From the power function, again neglecting the excess over the boundaries, we 
obtain the average sample number function as 


- P(6) log A + [1 — P(@)] log B 


E4(z) 


where 


f(y; @ 
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In particular, 


E,,(n) = (1 — B) ea 8 log B 


and 


a log A + (1 — a) log B 


Ee,(n) Ey, (2) 


The power functions and average sample number functions for the four test 
procedures are plotted in figures 3 and 4, the information from which is summarized 
in table 2. All four tests have power greater than .99 for values of 6 less than .05 
and power less than .03 for values of @ greater than .40. In the intervening range, the 
first test has good power at @ = .10, the second is moderately good at 6 = .20, the 
third has appreciable power at 6 = .30, and the fourth is good for all values of @ 
less than @ = .35. The value of a has been taken so as to keep the posterior proba- 
bility of a Type I error (p) nearly constant and less than .05, provided that the 
assumptions of the previous sections are satisfied. The average power P increases 
from .28 to .71, and the average sample number, which represents the cost of this 
gain in power, increases from 10 to 355. 

The investigator will probably seldom have need for sequential tests outside the 
above range. A test so insensitive as not to detect virtually all cases of close linkage 
(@ < .05) is of little use, while an increase in sensitivity much beyond 6; = .30 re- 
quires a prohibitively large average sample number: for example, when @ = 1/2, 
the test 6, = .40, A = 1000, B = .01 requires an average sample number of 5700 
double backcross sib pairs. 
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Fic. 3. The power function P(@) for different values of @,. Double backcross sibships of size 2 
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6. THE NUMBERS OF OBSERVATIONS REQUIRED BY FIXED-SAMPLE-SIZE TESTS AND 
SEQUENTIAL TESTS 


The exposition so far has considered criteria by which a sequential test may be 
chosen, and has suggested a battery of four tests which should be adequate for 
most purposes. We still require, however, to select among these procedures and, 
more immediately, to determine whether a sequential test is so superior to current 
fixed-sample-size tests in efficiency, computational simplicity, or exactness that the 
choice of a sequential test has more than academic interest. 

For a start, we may calculate the number of independent double backcross sib 
pairs required by current tests of strength (a, 8). In the case of u statistics there are 
two possible scores, 1 and —1, with frequencies 6? + (1 — 6)? and 26(1 — 86) re- 
spectively (Finney, 1940). The expected value of the score is ue = (1 — 26)?, with 
variance og = (1 — ye)(1 + ys). (Note that these symbols designate the expected 
value and variance of the score, not of 6.) If the sample size is small, it may be 
estimated by trial and error from a table of the cumulative binomial distribution, 
using the parameters p; = 26,(1 — 6:) and po = 269(1 — 6) = 1/2. If the sample 


~ x 

10} 
| 


288 NEWTON E. MORTON 


TABLE 2.—CHARACTERISTICS OF FOUR SEQUENTIAL TESTS 
6, = the formal alternative to the null hypothesis that @ = 14. 
a = the probability of rejecting the null hypothesis when 6 = 14. 
6 = the probability of accepting the null hypothesis when 6 = 4). 
P(@) = the probability of detecting linkage when the true recombination fraction is @. 
P = the probability of detecting linkage when @ is uniformly distributed between 0 and 14. 
p = the probability that a significant “linkage” be a Type I error. 
E(n) = the average number of double backcross sibships of size 2 required to terminate the test. 


| P@) 
A a — P | E(n) 
.10 | @= .20 | 6 = .30 | = .35 
-05 -0005 .86 10 006.002 .28 .032 | 10 
-10 001 | .01 | .99 -46 -02 006 | .39 -046 | 19 
-20 -001 | >.999 | 025 | .56 -032 68 
-001 -01 >.999 | >.999  .99 64 -026 355 


1/2 
E(n) ~ .10 E¢(n) dé + .95E,(n) 


is sufficiently large, the distribution of the sample mean will be nearly normal, 
and the following conditions will determine n(a, 8), the required sample number: 


d — 
o =1l—ea 
d — pe 
where d is a preassigned constant defining the critical region of the test and 
G(t) = le 
If we let to be the value for which G(to) = 1 — a, and t; be the value for which 


G(t:) = 8, and observe that ye, = 0 and o», = 1, then the two conditions may be 
written as 


Vnd = to 
Vnld — we) = — + 


Solving the above equations, we obtain 


— iV — wed + 


He, 


p-2/ P(6) de 
0 
19a 
19a + P 
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If this expression is not an integer, then, as in all formulae determining fixed sample 
size, n(a, 8) is the smallest integer in excess (Wald, 1947). 

In the case of the probability ratio test of Haldane and Smith (1947), there are 
two possible values of the logarithm of the probability ratio, namely 


f 0 2 
z’ = log Fa4 = log (2 — 40, + 46;) 


f(2; 6;) 
and , = log fQ2; 1/2) 


, 


NS 
| 


log [40,( 1 — 6;)}. 


The expected value of zp is we = z’ — 20(1 — 6)(z’ — z”), with variance 
= (2’)? — 20(1 — @)(z’ — 2”)(2! + 2”) — 
The first condition determining the sample size is 
= log (1/2), 


and if n is sufficiently large, the second condition becomes 


dz — Dee, _ 
Vn 
Solving for n, we obtain 
n = n*(4, 8) = — 


For the Haldane-Smith test the true significance level @ is less by a varying amount 
than the nominal level a, so that in this respect the test is conservative. Smith 
(1953) calculated that the median @ is approximately a/10 for @ = .001. The error 
of the normal approximation in determining n(a, 8) and n*(@, 8) is in the opposite 
direction, since the alternative distribution is skewed toward # = 1/2, and there- 
fore 8 and n tend to be underestimated. This error is negligible unless n is very 
small, and in table 3, which gives the results of these calculations, the smallest 
value of n(a, 8) is in close agreement with an exact determination from the cumulative 
binomial distribution. 


TABLE 3.—THE AVERAGE SAMPLE NUMBER E(n) FOR A SEQUENTIAL TEST, COMPARED WITH THE 
FIXED SAMPLE NUMBERS REQUIRED BY THE FISHER-FINNEY U SCORE TEST, N(a@, 8), AND THE 
HALDANE-SMITH PROBABILITY RATIO TEST, n*(a@, B) 


n = the required number of double backcross sibships of size 2. 


E(n) 
Oy a B n(a, B) n*(é, B) 
.05 .0005 01 9 20 34 49 
.10 .001 OL 18 31 59 89 
.20 .001 01 67 103 214 328 
.30 .001 01 355 529 1,134 1,740 


.40 .001 01 5,700 8,546 18,324 28,420 


290 NEWTON E,. MORTON 


The conclusions from table 3 are quite simple and consistent. Of the fixed-sample- 
size tests, u statistics require only about 2/3 as many observations for a given risk 
of error as the Haldane-Smith probability ratio test. If, in view of the conservatism 
of the latter test, a value of a ten times as large is used, the number of observations 
required by the test is intermediate between n(a, 8) and n*(@, 8), and is still ap- 
preciably in excess of the sample size required by the u score test. 

Although the superiority of the u score test over the Haldane-Smith probability 
ratio test is marked, the superiority of the sequential test is even more striking. 
When the alternative hypothesis is true, the sequential test requires only about 
1/2 as many observations as a u score test of the same strength, and when the 
null hypothesis is true (as it usually will be), the sequential test requires less than 
1/3 as many observations as the u score test. Similar savings in the number of 
observations have been found for other distributions by Wald (1947) and Bross 
(1952). 

For the detection of linkage we have knowledge that the user of a sequential test 
does not ordinarily have, in that the approximate parameter distribution is known, 
and we may calculate a mean sequential sample number E(n) averaged over this 
distribution (table 2). Over the range of tests considered, the mean sample number 
required by a sequential test of strength (a, 8) is less than 1/3 the number required 
by a u score test of the same strength. 


7. CLASSIFICATION OF FACTORS, MATINGS, AND METHODS OF SAMPLING 


In view of the considerable saving in observations indicated in the last section, 
sequential tests would seem to be the method of choice for the detection of linkage. 
For practical use, the determination of probabilities must be extended to families 
of different types and sizes. We first require a few definitions. 

Consider two loci, G and T, which are to be tested for linkage. The genetic char- 
acters which are determined by these loci may be divided into four classes. These are: 

1. Recessive abnormalities, such as albinism. The symbols G,g or T,t will be 
used for factors of this class. 

2. Common recessives, such as the gene for the inability to taste phenylthio- 
carbamide. Symbols G,g or T,t will also be used here. 

3. Factors without dominance, the heterozygote being distinguishable from 
both homozygotes. Sicklemia and the MN blood groups are examples of this class. 
The letters G; , Ge or T; , T: will be used for such factors. 

4. “Dominant” abnormalities, such as ovalocytosis. The normal homozygote is 
exceedingly rare (in most cases never having been observed), and all abnormal 
persons are therefore assumed to be heterozygous. The symbol G, or T; will be used 
for the normal allele, G2 or T2 for the abnormal factor. 

For a family to give information on linkage, neither parent may be GG or TT 
and at least one parent must be doubly heterozygous. An informative mating is 
termed a double backcross, a single backcross, or a double intercross according 
to whether the other parent is doubly homozygous, singly heterozygous, or doubly 
heterozygous. Since the phase of linkage is unknown, the probability for a double 
or single backcross will consist of two terms, one for each possible phase of the 
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doubly heterozygous parent, and the probability for a double intercross will consist 
of three terms, corresponding to the possibilities that both parents are in coupling, 
both in repulsion, or that one is in coupling and the other in repulsion. We shall 
assume that the two phases are at equilibrium in the population, a condition that 
should nearly always be closely approximated, except perhaps after recent hybridiza- 
tion. On the null hypothesis this assumption is of course supererogatory. 

It rarely happens that families selected for a linkage study are effectively a random 
sample from the general population. Usually families are selected first on the basis 
of the character determined by the “main” locus and are tested afterwards for the 
character determined by the “test” locus. There are three methods of selecting 
families on the basis of the main character (Bailey, 1951): 

1. Selection through the parents or grandparents, without consideration of the 
children. The sampling of families is effectively random, and in families of a given 
mating type and size, the distribution of the number of children manifesting the 
main character is a complete binomial series (complete selection). 

2. Selection through the children themselves, with complete selection of affected 
individuals. In families of a given mating type and size, the distribution of the 
number of children manifesting the main character is a truncated binomial series, 
with the first term missing (/runcate selection). 

3. Selection through the children, with incomplete selection of affected individuals. 
The distribution of affected individuals in sibships of a given mating type and size 
is not a truncated binomial, since families with large numbers of affected children 
are more likely to be ascertained than families with a smaller number of abnormals 
(arbitrary selection). This is the usual method of selection for recessive abnormalities 
and a not uncommon method of selection for “dominant” abnormalities and rare 
factors without dominance. 

Except in cases of gross ascertainment bias, the test character is never subject 
to incomplete selection of affected individuals (method 3). 

It should be noted that these three methods of selecting families for analysis 
subsume the rejection of some classes of ascertained families. The fundamental 
attribute of each type of selection is the distribution to which it gives rise, regardless 
of how the families were detected. For example, with recessive genes the propositus 
is sometimes an affected parent mated to a normal dominant, who may be either 
homozygous or heterozygous. A mating of a dominant parent is called “certain” 
if there is at least one recessive child (in which case the dominant parent must be 
heterozygous), and is called “doubtful” otherwise. Sampling is by method 1 or 2, 
according to whether doubtful families are included or rejected. The method of 
ascertainment is the same in both cases, but the method of selection is different, 
and determines the proper method of analysis. 


8. BOTH CHARACTERS SELECTED THROUGH THE PARENTS (COMPLETE SELECTION), 
PARENTAL GENOTYPES KNOWN, BOTH PARENTS TESTED. COMPLETE PENETRANCE, 
NO NATURAL SELECTION 


Unless there is no dominance for either character, some of the families will usually 
be of uncertain parental genotype. If these doubtful families are analysed separately 


| 
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TABLE 4.—MATINGS SCORED WITH Z;. DOUBLE BACKCROSSES AND SINGLE BACKCROSSES WITH NO 
DOMINANCE IN THE INTERCROSS FACTOR 


s=a+b+c+4+d 


| Progeny Phenot | Unin- 
a b c | d |Progeny 
Gg Tt X ge tt | 1 |oT |Gt gl gt | 
Gg X gg | 2 GT, | g Ti g 
G,Gz Tt X tt 3 G, T G, t T | t - 
Gg TiT: x gg TiT: 4 G Ti G g g TiT. 
G,G2 Tt X G,Ge tt 5 T t G2 T Get 
G,G2 X GiG, T:T; 6 | G, T; | T; G, Gz T;T2 
G,G, Ti:T2 x TiT2 7 Ti Te Ti G,G2 Te | 
G,G2 T;:T: X G,G2 TiT; | 8 G, T; | Gi | G2 | 
Frequency | a b | c | d | Total 
| 
— | 
Coupling 1 1-86 0 0 | 1-0 | 2 
Repulsion 1 6 1-06 1-—@ 0 2 
Total | 1 | 1 | 1 | 1 | 4 
= log = log 2° — + oP — 
f(y; 4) 


(see $12), then the methods of this section are appropriate to the certain families. 
If the doubtful families are rejected, the certain families should be analysed by 
the methods of §§9-10. 

Neglecting multiple allelism, the possible kinds of certain families may be grouped 
into 5 classes, which by the method of u scores have 3 essentially different scores 
and 2 derived scores (Finney, 1940). In sequential tests the same classes exist. The 
scores in a sequential test are “lods”, or logarithms of the probability ratio, the 
five functional forms of which may be denoted by z:, zz, 23, Z¢, and Zs, in exact 
correspondence with the un , Usi , Uss , 2Us, , and 2un scoring types of Finney. 

Tables 4-8 give the possible certain matings and the lod scores appropriate to 
them. Matings scored with z; (table 4) comprise double backcrosses and those single 
backcrosses in which there is no dominance for the intercross factor. There is thus 
a one-to-one correspondence between progeny genotype and phenotype for both 
loci. Note that some progeny have probabilities that are independent of the re- 
combination fraction and phase, and therefore give no information on linkage. 
Matings scored with z2 (table 5) are single backcrosses with dominance in the inter- 
cross factor. Matings scored with z; (table 6) are double intercrosses with dominance 
in both factors. Most matings of common occurrence are scored with the 2, 2, 
or Z3 lods, of which the z; type is much the most informative. 

The two remaining scoring types are of particular interest because the u score 
method omits progeny from which information is extracted by the lod scores. Matings 
scored with z, (table 7) are double intercrosses with dominance in only one factor. 
There are six progeny phenotypes, the last two of which have probabilities that are 
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TABLE 5.—MATINGS SCORED WITH Z2. SINGLE BACKCROSSES WITH DOMINANCE IN THE 
INTERCROSS FACTOR 


| 
| P h Unin- 
| Mating | rogeny phenotype 


a | b | c d Progeny 
Gg Tt X Gg tt 9 | GTI | gT Gt | gt 
Gg Tt X gg Tt | 10 | GT Gt gT > gat = 
Gg T:T: X Gg T:T; |} |GT | | 
Tt X Te 2 Gt | G&T| | — 
Frequency a | b | c d Total 
Coupling 1 2-60 1+0 4 
Repulsion 1 i-@ 2-0 4 
Total 3 | 1 | 3 | 1 8 
Zz = 108 = log Reve — (1+4) (1 6,)° + (1+6)°1 6,)"(2 6:)° 


TABLE 6.—MATINGS SCORED WITH Z3. DOUBLE INTERCROSSES WITH DOMINANCE IN BOTH FACTORS 


| 
Progeny phenotype | 


Parental genotype | Mating Type — 
| a | b c | d | 
Gg Tt X Gg Tt | 13 GT | Gt gT | gt | — 

Frequency a b | c d Total 
GT/gtXGT/gt 1| 02-8) | @(2 — @) (1 — 6)? 4 
GT/gtX Gt/gT 2 2+0-—@ | 1-0+@ | 1—0+6@ | 6(1 —@) 8 
Gt/gTXGt/gT 1| 2+ 1 | + 
9 3 3 1 16 


f(y; 6;) 4-1 
f(y; 18 E — 26, + 61)" (2 — (1 — + 2 (2 + — 65)" 


(1 — + (1 — 0)" + (2 + (1 — 


linear functions of @(1 — @), whereas the other four types include terms which are 
not linear in 6(1 — 6), like 6*. When 6 — 1/2, the deviation of 0(1 — 6) from 1/4 
is vanishingly small compared with the deviation of 6? from 1/4, and the last two 
classes contribute almost no information on linkage. It is not surprising, therefore, 
that when the probability is expanded in powers of 1 — 26, and the cubic and higher 
terms neglected, the appropriate u score is a function of only the first four classes 
(Finney, 1940). Since loose linkage (@ — 1/2) is never in practice distinguished 
from non-linkage (@ = 1/2), the important consideration is that the information 
contributed by the neglected progeny (which constitute 1/2 of the total children) 
is not negligible when @ is small. 


$888 ao 3 5 PE SES REL Ss SBS 
y+ 74545 +a AN 
7+q=A 
p+t=n 
ig — 1 497 an (2/1 Zo; = 
ad 
(36% + 6% — 1)2 (@ — (9 — (9 — (@ — 20 (6 — 1) (@ — 1) 20 x uoysinday 
8 (6— 262 + —1 202 + — 1 260 — 1 262 + — 1 (@— (@— (@— (@— Z x 
(36% + 0% — 1)z (0 — — (9 — (9 — — 1) 20 20 — 1) 1 x 
! 3 a 2 q Aduanbal 
Auasoig ! | q 3 P q 
g YOLOVA AAHLIA NI AONVNINOA ON HLIM SASSOMOUALNI ATHNAOG “Z HLIM GAAOOS SONILVJ ATAVL 
25) 
+ '@ — — Z)0('@ + — + + 
(% ‘4)} 
06204948 
= OZ — — + + — — + + — — + — — 30] = Tia Bo] = 
vA | Z 9 | I ¢ I ¢ & [210], 
| (@ — | | 20 — 1) — uorsjndas x uorsjnday 
8 | az + — — + 1 | — 1)8 | (0 — 1)8 | Z  uoysndas x 
(@ — | (6 + 6 — | — 1) | (0 — | 20 | 0-1)  Suydnos x 
| | 
| LIL 9 | 3 | | 89 x 89 
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Matings scored with zs (table 8) are double intercrosses with no dominance in 
either factor. The lod score is based on 9 distinguishable progeny classes, the last 5 
of which contribute no information when @ — 1/2, and are therefore neglected in 
computing the u scores (Finney, 1940). When @ is small, however, the information 
contained in these children (which constitute 3/4 of the progeny) is no longer negli- 
gible. 


9. ONE CHARACTER SELECTED THROUGH THE PARENTS (COMPLETE SELECTION), THE 
OTHER THROUGH THE CHILDREN (INCOMPLETE SELECTION). PARENTAL GENOTYPES 
KNOWN, BOTH PARENTS TESTED. COMPLETE PENETRANCE, NO NATURAL SELECTION 


For convenience we may denote the factor that is selected through the children 
by G,g, Gi, or Gz, and the factor selected through the parents by T, t, T: , or To. 
The method of this section is appropriate only if families of doubtful parental geno- 
type with regard to the T locus are not rejected (section 12); the selection of the G 
factor is arbitrary. 

In a family of size s let there be s; children of one G type, say G, and s» of the 
other (s; + Ss: = s). The prior probability of the family will be designated by f(y;6) 
and the conditional probability by f(y;6 | s:). Then 


f(y;8) 
P(sj,Se) 


where P(s;,S2) is the probability measure of the selected class of families. Since the 
two characters are selected independently, and the probabilities which are pooled 
in P(si,S2) are complementary, P(s;,s2) is independent of 6 and of the phase of linkage 
and cancels when the probability ratio is formed. Thus the probability ratio and 
the lod score derived from it have the convenient property of being invariant with 
respect to biased sampling of one character only, and families selected in this way 
are scored just as if both characters had been ascertained through the parents 
(Smith, 1953). 


f(y;@ | = 


10. BOTH CHARACTERS SELECTED THROUGH THE CHILDREN, COMPLETE SELECTION OF 
AFFECTED INDIVIDUALS (TRUNCATE SELECTION). PARENTAL GENOTYPES KNOWN, 
BOTH PARENTS TESTED. COMPLETE PENETRANCE, NO NATURAL SELECTION 


Families in which the parental genotype is unknown for either factor are rejected. 
The condition on both factors makes the marginal distribution of the selected 
families a function of 6, and the methods of the previous sections require modifica- 
tion. There are three types to be considered, corresponding to the z:, ze, and 2; 
scoring types. We shall suppose that the selected factors are g and t, since only 
matings in which both characters are common recessives are likely to be selected in 
this way. 

(1) The z scoring type (Mating 1) 

The distribution of the selected families is 

f(y;0) 


f(y;@ | g,t) = P(g,t) 
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where P(g,t) is the probability that a mating of this type have at least one g and 
one t child. To satisfy this condition, it is sufficient that c++ d  Oandb +d = 0. 
Therefore, 


P(g,t) = 1 Pic +d = 0) — =0)+ P(bb+c+d = 0). 


= (1/2))} = P(b+d = 0) 
and P(b +c +d = 0) = Pla = s) = (54) (5) } , and so 


Z 2 2 
P(gt) = 3(1 — 
It follows that 
| _ f(y;61) P(g,t;1/2) 
=a+cC 


2° — 2 +(1/2)* 


where c; = 


Thus the lod score in this case, and in general, is simply the score appropriate to 
random sampling plus a correction factor which is determined by the method of 
selection. The factor c; is exactly analogous to — ¢; in the theory of u scores (Finney, 
1940). 
(2) The ze scoring type (Matings 9 and 10) 
Using the same notation as before, we find that 
loo | 
f(y;1/2 | g,t) 
where Cc = lo y + (3/2)) 
(3) The zs scoring type (Mating 13) 


f(y; | gt) _ 


4° — 2(3 + (9/4)° 


11. BOTH CHARACTERS SELECTED THROUGH THE CHILDREN, ONE COMPLETELY 
(TRUNCATE SELECTION), THE OTHER INCOMPLETELY (ARBITRARY SELECTION). 
PARENTAL GENOTYPES KNOWN, BOTH PARENTS TESTED. COMPLETE PENETRANCE, 
NO NATURAL SELECTION 


Let the character with arbitrary selection be denoted by g or Gz, and let t de- 
note the character with truncate selection. The family is ascertained through the 
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G factor and then tested for the T factor, with rejection of families in which there 
is not at least one t child. (If these families are not rejected, or if there is no domi- 
nance in the T factor, see §9.) Occasionally the method of incomplete ascertainment 
of the G factor may be known exactly, but the simplest and most reliable procedure 
is to consider the distribution of the families with the G factor fixed, so that the 
method of selection does not enter into the argument (Finney, 1940). 


A. Dominance in the G factor (G,g type) 


Let there be s; children of type G and s» of type g (s: + s» = s). The distribution 
of selected families is 


+) = 8) 
f(y; 6 | $1,S2,t) = 


where P(s),S2,t) is the probability measure of selected families of this class. Note 
that sp = 0 implies ascertainment of the G factor through the parents or uninforma- 
tive children, hence the s),s2 method of scoring is not appropriate unless s» > 0 or 
the viability of the G,g types is abnormal. 

(1A) The z scoring type (Mating 1) 


P(si,Se,t) = P(si,s2) — P(si,s2, b + d = 0) 


= k (:) (1/2)""(1/2)"* 
Si 


b + d = (0) P(a = = So) = ( ) 


Therefore, 


where k is a selection factor dependent only on s; and sz and 


| $1, Se, t) 


where 


1 — (1/2) 
8 1 — — — — 


(2A) The zz scoring type (Mating 9) 


f(y;0, | S1,S2,t) 
log f(y 31/2 | $1,S2,t) 
— (1/2)9) 
— (2 — — (1 + — 


= t 


e. = log 381 


1 /0\ (1 — 6\" 
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(3A) The zp scoring type (Mating 10) 

f(y 31/2 $1,S2,t) 

2° — (3/2) 


= dy 


(4A) The zs scoring type (Mating 13) 


f(y;0 | s1,S0,t) 

(y 31/2 | s1,S2,t) 
— (3/4)*] 

— 63)". 


= 23 + e3 


log 


B. Incomplete dominance in the G factor (G,,G2 type) 


Rare ‘“dominants” and a few characters lacking dominance (sicklemia, thalas- 
semia) are sometimes selected incompletely in this way. This situation was not 
considered by Finney (1940). 

(1B) The z; scoring type (Mating 3) 

Let s; be the number of G; children, and sz be the number of G,G» children. Then 

the probability ratio is the same as for type 1A above, and 


f(y; | S1,S2,t) 
(2B) The z scoring type (Mating 5) 
If the family is selected through a GG child, then there is random sampling for 


the informative progeny, and the method of section 9 applies. If selection is through 
an informative G; or Ge child, then 


f(y;A1 | 
where s; is the number of G; children and s, the number of G» children. 
(3B) The zz scoring type (Mating 12) 

Let there be s; children of type G; and s2 children of type Gi:Gz. The probability 
ratio is the same as for 3A above, and 


f(y;01 | S1,S2,t) 
oO 
f(y 31/2 | s1,52,t) 
(4B) The z, scoring type (Mating 15) 
Let there be s; children of type G; , s2 of type GiG2 , and s; of type G, 


(si + Se + ss; = s). Then 


Za + do. 


f(y; | 


= e& 


f(y ;1/2 | s1,S2,8s,t) 
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-(1 + 20, — 20;)** — 3[6,(2 — — 0, + — 


This completes the analysis of the matings in tables 4-8. These include all the 
scoring types of Finney (1940), who used 3 essentially different scores, 2 derived 
scores, 7 score corrections, and 12 essentially different information functions. For 
the same matings, the probability ratio method requires only 5 scores and 7 correc- 
tion factors. The development of the probability ratio scores is extremely simple 
and may easily be extended to more complex cases, such as multiple allelism, un- 
certain parental genotypes, and pedigree data. To facilitate numerical analysis of 
the matings that have been treated so far, the scores for small families are given 
in tables 10-18. 


12. PARENTS OF UNKNOWN GENOTYPE, BOTH PARENTS TESTED. COMPLETE PENETRANCE, 
NO NATURAL SELECTION 

Parental heterozygosity for recessive factors can be established by the observa- 
tion of recessive children, in the absence of which a family without pedigree informa- 
tion is termed “doubtful”. Information may still be extracted from these families, 
provided that the population gene frequencies are known and that mating is at 
random with respect to the doubtful locus. We have seen in §9 that when families 
are selected through the parents for the test factor, and doubtful families are not 
rejected, then no score correction is needed for families of known parental genotype 
regardless of how the main character is selected. Matings doubtful for the main 
character may also be analysed. 

In connection with the doubtful families it will be convenient to introduce a few 
new symbols. Let p; denote the frequency of the t gene and p, the frequency of the 
g gene. Occasionally children will not be scorable for linkage, either because they 
are uninformative or because they are incompletely tested. If these children are 
tested for the doubtful character, they give information about the parental geno- 
types and should enter into the present calculations. Let S be the number of scored 
and unscored children which are tested for the doubtful character, in contradistinc- 
tion to s, the number of children which are scored for linkage. As an example of the 
general procedure, we shall develop scores for the “doubtful” analogues of the z; 
scoring type. 


(1) Families doubtful for the t factor (Matings 1, 3, 5) 


All children are of type T. The prior probabilities for homozygosity and hetero- 
zygosity of the T parent are (1 — p;)* and 2p,(1 — p;), and the conditional proba- 
bilities for the children are 


(1/2)* and ${6°(1 — 0)° + 6°(1 — 6)"}(1/2)° 


respectively. Therefore, 


tor £739) Jog 2 — — — — 64)"} 
— pf — (1/2) 
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(2) Families doubtful for the g factor (Matings 1, 2, 4) 


All children of type G. The probability ratio is the same as for the previous type, 

except for the substitution of p, for p, and b for c. 
log Jog 2” — Pel * — — 04)” — — 
f(y 31/2) 
(3) Families doubtful for the g and t factors (Mating 1) 

All children of type GT. The GT parent may be GGTT, GgTT, GGTt, or GgTt, 
only the last of which is informative. The lod score is 
log tog 2 = We + pd) + 2+ + (1 4)" 

f(y 31/2) — — 1) (pe + pe) + pepef 2" — 24+ 

The scoring system for the doubtful families may easily be extended to the ana- 
logues of the z2, zs, and z, scoring types. However, the application of these scores 
is quite tedious in the absence of ancillary tables for each of the common test factors 
and, more important, the doubtful families have in practice been found to con- 
tribute relatively little information on linkage. Finney found in one example that 
scoring doubtful families for the ABO locus increased the available amount of in- 
formation by only 5%, and he advised that “for a preliminary investigation of a 
linkage, scoring may well be confined to the certain families” (Finney, 1940). This 
policy, besides reducing the labor in linkage detection, has the further advantage of 
making linkage tests independent of the mating system and the population gene 
frequencies. Unless the data are extremely valuable, it seems best to score only 
the certain families, using where necessary the correction factors of §§10-11. 


13. ONE OR BOTH PARENTS NOT DIRECTLY TESTED, COMPLETE PENETRANCE, NO 
NATURAL SELECTION 


The extraction of information from untested parents by the method of u scores 
involves considerable algebraic manipulation and heavy arithmetic. Finney (1941b) 
has treated a few special cases and Smith (1953) has suggested an approximation 
for use in large samples. Fortunately the probability ratio method is so simple 
that ad hoc computation is always feasible, although the calculations are still tedious. 

Suppose first that all ascertained families with untested parents are to be analysed, 
subject to the condition that families are sampled through the parents for both 
characters or that they are sampled through the parents for one character and the 
parental genotypes for the other character are known. On these assumptions the 
method of ascertainment does not affect the calculation, which consists in enu- 
merating all parental genotypes which could give rise to F, the family in question, 
and then computing from the population gene frequencies and the assumption of 
random mating the prior probabilities of the mating types, say P(M,), P(Mz), «°° 
etc. The conditional probabilities, P(F | Mi), P(F | M2), --- etc. are then calcu- 
lated. Finally, the score for linkage is computed as 


_ P(My)PCF | Mi,6r) 


log 


f(y1/2) 55, P(M) PCF | M,,1/2) 


which of course is zero if none of the conditional probabilities is a function of @. 


~ 
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These calculations are straightforward but time-consuming, and the investigator 
of human linkage would be well-advised to test both parents whenever possible. 
Full information cannot be recovered from incomplete records, although large 
families, whose scores are dominated by the conditional probabilities, are nearly as 
informative as if both parents had been tested. If instances of incomplete parental 
testing are not too common, no great amount of information will be lost by rejecting 
families with incomplete parental records. Alternatively, the scoring of incomplete 
records may be restricted to families whose parental genotypes can be inferred with 
certainty. In this case the linkage test is independent of gene frequencies and the 
mating structure of the population, considerable labor is saved, and at least some 
large families with only one tested parent will be included in the analysis. The score 
for the families whose parental genotypes are inferred is z + C, where z is the score 
appropriate to complete selection with both parents tested and C is a correction 
factor dependent on the method of sampling and inference. There are many special 
cases for C, all of which are easily treated ad hoc by the elementary methods used 
in $$10-11. 


14. NATURAL SELECTION AND INCOMPLETE PENETRANCE 


Genetic main factors with incomplete penetrance or low viability may still be 
used for linkage studies if we assume that the test factor is fully penetrant, viable, 
sampled at random through the parents or through complete selection of affected 
children, and that the viability and penetrance of the main factor are independent 
of the test factor. 

For example, suppose the main factor is fully penetrant but so subvital that many 
affected progeny die before examination. On the above assumptions, it is still proper 
to test linkage by the methods of §§9 and 11, and the probabilities of Type I and 
Type II errors remain unaltered. Notice that no assumption need be made about 
the constancy of viability among families, either in the detection or estimation of 
linkage. 

Again, suppose that the main gene is incompletely penetrant, with no assumptions 
made about viability or ascertainment. We shall assume that the main factor is so 
rare that all matings will be backcrosses if the main factor is a rare “dominant” or 
intercrosses if the main factor is a rare recessive. Given the above conditions on the 
test factor, the probability of a Type I error when the methods of §§9 and 11 are 
used will not be changed, regardless of whether penetrance is variable or not, but 
the power of the test will decrease very greatly when penetrance is low. In this case 
estimation of the penetrance will improve the power of the test, without affecting 
the probability of a Type I error. 

In practice, the distinction between loose linkage to the main factor and linkage 
to viability or penetrance modifiers may be difficult to make, and therefore only 
tests of close linkage have much value when viability or penetrance is irregular. 
Even with such tests the rigorous justification of the assumption that the test factor 
does not influence the viability or penetrance of the main factor is extremely diffi- 
cult, and may well be attempted only for tests which indicate a significant “linkage”’. 
Proof that the main and test factors are distributed independently in the general 
population, the absence of a correlation between the test phenotype of affected 
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parents and affected progeny, constant penetrance, and homogeneity of the linkage 
value give supporting evidence for the hypothesis of linkage, while contrary observa- 
tions suggest alternative explanations. Knowledge of the exact method of ascertain- 
ment is helpful in detecting irregularities, especially with rare recessive factors. All 
these problems are particularly acute when the test factor is extremely complex, and 
great difficulties have been encountered in attempts to distinguish linkage when 
sex is used as the test factor (Harris, 1948; Mohr, 1954). Even with less fundamental 
test traits, a significant “linkage” effect requires special scrutiny when the pene- 
trance or viability of the main factor is low. If the test factor also behaves irregularly, 
the difficulties in linkage detection are vastly increased. 


15. THE COMBINATION OF DATA 


In §§5-6 the properties of the sequential probability ratio test were illustrated 
on the simplifying assumption that the data consist entirely of double backcross 
sibships of size 2, and it was shown that for this case the sequential test is very 
much superior to alternative procedures. In practice, linkage data in man comprise 
a mixture of family sizes and mating types, the frequencies of which vary among 
pairs of loci and are usually unspecified. We shall now show that this ignorance does 
not affect the important properties of the sequential test. 

Let k = 1, 2, --- , denote a particular mating type and family size, f(y;0) be 
the conditional distribution for the k"* type of data, and p, be the prior probability 
of this type of data. Consider only sampling procedures for which p, and fx(y;@) 
are independent of the stage of sampling. Then clearly the distribution pxfx(y;@) is 
of the stationary type treated by Wald and all the important results of his sequential 
theory apply. In particular, it has been shown that of all tests with the same risk of 
error (a, 8), the sequential probability ratio test requires on the average fewest 
observations, and that the Type I and Type ITI risks are approximately 


A-—-B 

, B(A—1) 


these approximations being very good when the excess of }> z over the boundary 
log A or log B is negligible. This condition is satisfied if | E(z) | and the standard 
deviation o, of z are sufficiently small, as in practice they usually will be. In any 
case the optimum character of the sequential test holds exactly (Wald and Wolfowitz, 
1948). 

Although the existence of a stationary distribution pxfx(y;@) is sufficient for the 
proof of the above remarks, it is not necessary that the p, be known to carry out the 
test. For the py are independent of @, and therefore the probability ratio 


Px 6) 
Px 


is identical with the ratio 
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10 


50 


Fic. 5. The power function P(6) for different types of data. A = 1000, B = .01, 6, = .20. 


Determination of the px, is necessary only if it is desired to find the power function 
and average sample number function of a sequential test, but this is of secondary 
importance so long as there is some basis for the choice of a particular test and we 
know that the sequential test on the average leads to a saving in the number of 
observations. 

To choose a sequential test, it is convenient to have a rough notion of the average 
power of alternative tests. The power function depends on the distribution px , 
but the risks (a, 8) do not, and this limits the possible fluctuation of the power 
function. Figure 5 shows a typical power function for three different types of data. 
The power function and the average power do not seem to be so highly variable as 
to jeopardize the control over Type I errors demanded for the idealized case in §5. 
In particular, it still seems appropriate to choose an unusually small value of a, of 
the order of .001. 

The choice of 6; for a sequential test is largely determined by the average sample 
number on the null hypothesis, since (1) for randomly chosen loci the null hypothesis 
will usually be true and (2) the number of observations that can be tolerated is 
not narrowly bounded, so that random excesses over the expected number will 
usually not be a serious annoyance. A rough correspondence between expected 
sample number and amount of information may be established as follows. 

Let n be the number of families required to terminate the test in mixed data and 
ny be the number of families required for the test in data entirely of the k"* type. 
Let E(z) denote the expected value of z in mixed data and E(z,) the expected value 
of z in the k** type of data. Also let c be a fixed value of k. Then on the null hy- 
pothesis 


E(n)E(z) = @ log A + (1 — a) log B = E(n,)E(z,) 


and 
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TABLE 9.—THE EFFICIENCY OF DIFFERENT TYPES OF DATA IN DOUBLE BACKCROSS 
SIB-PAIR EQUIVALENTS 


1¢ 
6 _ u score 
A. Families of size s, phase unknown 
=2 1.0 1.0 1.0 1.0 
Z2,8 = 2 1 1 
3.8 5.3 8.8 10.0 
a,s = 10 9.7 14.7 33.1 45.0 
B. Single progeny, phase known 
double backcross 1.6 3.2 25.4 — 
single backcross a 1.0 8.5 — 
1.8 12.3 — 


double intercross, coupling, both factors dominant 1.0 


where i = 1, 2, --- , E(n) denotes successive observations from the distribution 
pxfx(y;@). If we let c designate double backcross sibships of size 2, then the ratio 
E(z,)/E(z-) may be called the double backcross sib-pair equivalent on the null hy- 
pothesis. It has the property that if E(n,) is the average number of double back- 
cross sib-pairs required by a certain test when @ = 4) = 1/2, then E(n.)E(z.)/E(zx) 
is the average number of families of type k required for the same test, assuming 
in both cases that the excess over the boundaries at the termination of the test can 
be neglected. Furthermore, for small families E(z,)/E(z.) is of the same order as the 
information weight k in Finney’s (1940) system of u scores (table 9). It follows that 
if S is the number of units of u score information that can be obtained with “reason- 
able” effort, then S is an estimate of Z E(z,)/E(z.) and E(n,) also, and this cor- 
respondence may serve as a rough guide in the selection of a sequential test. If S 
is about 10, 6, should be chosen to be .05, since E(n.) = 9 for 6, = .05. Similarly, if 
S is about 70, 6, should be taken as .20, if S is as much as 350, 6; may be .30, and 
only if S is about 6000 should @, be .40. For linkage of two common test factors 
(ABO, Rh, MN), S may be as much as 6000, and for two less common test factors 
(Le, Lu, P, Fy blood groups), S may be 350. In most other cases S is probably 
smaller than 100, and @, should be chosen accordingly. If it turns out that S has 
been considerably underestimated, a second test with a larger value of 4, will not 
increase a beyond tolerable limits. 

The restriction of the sampling procedure to stationary distributions has pro- 
scribed a valid sampling method that in some respects seems desirable. All types of 
data might be collected at the beginning of sampling and whenever linkage is sug- 
gested, but when there is no suggestion of linkage it would seem economical to 
investigate only highly informative families for which the double backcross sib-pair 
equivalent is large. This makes px dependent on >> z, but f,(y;@) is not affected 
and the probability is still one that the procedure will eventually terminate. It is of 
course essential that data be reported without regard for whether they indicate 
linkage or not. Wald (1947) has shown that the postulated kind of dependence does 
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TABLE 11 
Zz 
a 

.05 10 .20 .30 .40 
0374 .0298 .0170 .0019 
—.1967 | —.1062 .0238 .0058 
—.0410 | —.0320 0177 .0078 .0019 
1038 .0840 .0492 
2577 1335 | 0170 
1038 .0840 .0492 0226 0058 
—.7212 | —.4437 1938 .0757 0177 
—.1367 | —.1042 .0238 
2577 1335 0645 
1038 .0840 .0492 
~.2596 | —.1908 .0969 
—.0410 | —.0320 0177 .0078 .0019 
2122 1754 .1072 0133 
1038 .0840 0226 .0058 
—.0410 | —.0320 .0078 
| 0757 0177 
—.0410 | —.0320 0177 
~.0410 | —.0320 0177 .0078 
3711 2041 0280 
5353 4654 3181 0492 
3711 3153 2041 0280 
| 1938 0757 0177 
2122 1072 .0509 0133 
—.7212 | —.4437 1938 0757 0177 
| .1938 0177 
1038 0226 .0058 
—.2596 | —.1908 0098 
1038 .0492 0226 0058 
5353 4654 3181 1703 0492 
.0940 0441 0114 
—.3608 | —.2532 0118 

—.0035 | —.0022 .0007 .0001 0 
3231 1717 0843 0226 
—.0492 | —.0442 | —.0295 0144 0038 
~.1776 | —.1362 | .0732 —.0316 0078 
—.6838 | —.4139 | —.1768 | —.0681 0158 
—.0819 | —.0641 0355 | —.0156 0039 
0628 | .0519 0315 | 0038 
2775 | «1442 0406 
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TABLE 11.—Continued 
| 6 
¢ | } } 4 

0s | 20 .30 .40 

.3804 .2245 1178 0332 
1/2/11] 2167 .1828 1158 .0567 .0151 
| —.8579 | —.5479 | —.2493 .0995 .0236 
ave 0628 .0519 .0315 .0148 .0038 
1 1 — .7622 — .4757 — .2115 .0835 0197 
1) 1} 0 | 2 | ~.6174 | —.3597 | —.1446 .0532 0120 
‘ieisis —.0035 | —.0022 | —.0007 .0001 0 
—.1776 | —.1362 | —.0732 .0316 .0078 
1 | 2167 . 1828 .1158 .0567 .0151 
1) 0] 0 | 3 6492 5678 .3950 .2171 .0647 

| | 

| 0 8140 7201 .5171 .2979 .0940 

40 6492 5678 .3950 .2171 .0647 

}o|3}]o0] 1 — .4636 | —.2289 | —.0603 .0113 .0007 

4847 .4166 .2775 .0406 

}o)}2/1 | 1 —.6174 | —.3597 | —.1446 .0532 0120 

| } 

2] 0 | 2 | —1.4425 | —.8874 | —.3876 .1514 .0355 
o;1{|3 {0 3231 2715 1717 .0843 .0226 
0/1 | 2 | 1 — .6838 | —.4139 | —.1768 .0681 0158 

}o}1] 1] 2 —.8579 | —.5479 | —.2493 .0995 .0236 

1]0] 3 —.4636 | —.2289 | —.0603 .0113 .0007 

| 0 | 0 | 4/0 .1898 1559 .0940 .0441 0114 

|o|joj|3]|1 —.3608 | —.2532 | —.1219 .0494 .0118 

| o|o|2]| 2 —.0492 | —.0442 | —.0295 0144 .0038 

}o}ol]1]3 3804 3311 2245 1178 0332 

|o0/0}]0| 4 8140 7201 5171 .2979 0940 

| 

5 | 5 | 0 | 0} 0 2879 .2396 1486 .0716 0189 
4/11]01]0 —.4307 | —.2859 | —.1294 .0507 .0118 
4/o0j;11]0 0628 0519 .0315 0148 .0038 
4/0]0|1 .4354 3703 2407 .1219 .0335 
3 | 2 0 | 0 | ~.2006 | —.1678 | —.1004 0458 0116 
| | | 
| 1 | 0 | — .3006 — .2229 — .1146 .0482 0117 
3/1]|0)1 —.6174 | —.3597 | —.1446 .0532 .0120 
31/0/21] 0 —.0819 | —.0641 | —.0355 .0156 .0039 
1712 .1434 .0895 .0431 .0114 
3 | 0/0 | 2 | 5985 5185 .3527 1886 0546 
| | 

2256 1972 | —.1325 .0679 .0187 
0628 0519 | —-.0315 0148 .0038 
| —.9809 | —.6345 | —.2907 1161 
2}/1/2 {0 —.0819 | —.0641 | —.0355 .0156 .0039 
Rieti. —.7622 | —.4757 | —.2115 .0835 0197 
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TABLE 11.—Concluded 


PWHKO KF OR COCO WNHKON OF OO mon | 


aA 

05 | 10 .20 40 
—.5091 | —.2683 .0866 0248 | —.0044 
—.0819 | —.0641 .0355 0156 | —.0039 
—.0819 | —.0641 .0355 .0156 | —.0039 
3301 2832 | —.0949 .0261 
7631 .6703 4727 2656 .0814 
6591 .5855 4211 .2401 
4943 .4333 .3003 1625 .0473 
—.6174 | —.3597 .0532 | —.0120 
3301 2832 .1864 .0949 .0261 
—.7622 | —.4757 2115 .0835 | —.0197 
—1.4425 | —.8874 .3876 | —.0355 
1712 .0895 .0431 
—.7622 | —.4757 2115 .0835 | —.0197 
—.7622 | —.4757 2115 .0835 | —.0197 
—.3502 | —.1284 0103 0269 0103 
.0628 0519 .0315 .0148 .0038 
—.3006 | —.2229 .1146 .0482 | —.0117 
0628 .0519 .0315 0148 0038 
4943 .4333 .3003 1625 0473 
9279 8228 5958 .3489 1130 
1.0927 .9753 .7200 .4358 1486 
9279 .8228 .3489 1130 
— .1860 .0217 .0945 .0315 
7631 .6703 4727 .2656 0814 
—.3502 | —.1284 0103 0269 0103 
—1.4425 | —.8874 .3876 .1514 —.0355 
5985 .3527 .1886 0546 
—.5091 | —.2683 .0866 .0248 | —.0044 
—1.4425 | —.8874 .3876 1514 | —.0355 
—1.4425 | —.8874 .3876 1514 | —.0355 
4354 .3703 1219 .0335 
| —.6174 | —.3597 .0532 | —.0120 
—.9809 | —.6345 .2907 1161 | —.0275 
—.6174 | —.3597 .0532 | —.0120 
|  —.1860 .0217 1242 .0945 .0315 

| 
2879 | .2396 | .1486 | —-.0716 .0189 
| —.4307 | —.2859 | —.1294 | -—.0507 | —.0018 
—.2006 | —.1678 | —.1004 | —.0458 | —.0116 
2256 | .1972 1325 | | —-.0187 
6591 4211 .2401 | 0741 
1.0927 9753 7200 .4358 
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TABLE 12 
| | a 
s | a bre d = 
| | 05 | 10 .20 30 40 
0 0 0120 0090 .0045 | 0004 
1 0 — .0382 | —.0281| —.0139| —.0056| —.0013 
|} 4 o/|] 1 0979 0747 .0392 0164 | 0039 
«@ 0979 0747 .0392 | 0164 0039 
| oO ei 4 —.6174 | —.3597| —.1446| —.0532| —.0120 
| 0 0 | 2 5154 4297 2671 | 1289 .0341 
| 
ee 0 | 0 | 0373 0277 .0139 0056 | .0013 
2 1 0 —.0740 | —.0528| —.0249| —.0096| —.0022 
2 0 1 1993 1543 .0824 0346 | .0083 
1 2 0 0542 0386 .0175 0063 | .0014 
1 1 1 —.5782 | —.3270| —.1244| —.0435| —.0094 
1 0 2 6252 5235 .3273 1582 0417 
0 3 0 2076 1680 .0984 0451 O115 
0 2 1 —.7622 | —.4757| —.2115 | —.0835 | —.0197 
0 1 2 — .3502 | —.1284 .0103 | 0269 0103 
| 0 0 3 1.0706 9308 .6361 3405 0984 
4 | 4 0 0 0763 0568 0283 | 0114 .0026 
1 0 —.1064 | —.0732| —.0325| —.0121| —.0027 
=. 0 1 3034 .2378 .1293 | 0547 .0131 
2 | 2 0 0108 .0034 | —.0026| —.0025| —.0008 
—.5261 | —.2848 | —.0995| —.0319| —.0065 
| | 
7 7353 .6180 .3891 | .0498 
1 3 | oO | 1632 .1298 .0727 | 0317 0078 
1 | 2 | 4 | —.7877} —.4859| —.2092 | —.0801 —.0185 
1 | 1 | 2 — .2462 | —.0439 .0608 | 0509 0166 
1 0 3 1.1811 | 1.0270 .7031 | 3775 1093 
| o | 41] 3187 2657 1676 | 0225 
| — .6937 | —.4465| —.2188| —.0938 | —.0232 
| 0 2 2 —1.0746 | —.5775| —.1905| —.0539| —.0092 
| 0; 1 3 1856 3389 3347 | 2058 0640 
| 1.6280 | 1.4403 | 1.0343 | 5958 1880 
| 
5 | o | o | .1286 | 0961 0480 .0191 0044 
| 4 1 |. | ~.1343 | —.0884| —.0365| —.0128 | —.0027 
| 4 @ | .4092 | 3245 1795 .0766 0183 
| | —.0322 | —.0307 | —.0208 | —.0100| —.0026 
| 3 1 1 | —.4620| —.2335| —.0698| —.0185| —.0031 
| 
0 2 | | 4522) —-.2206 0582 
3 | .1189/  .0920/ .0478| .0044 
| 2 2 1 —.8075 | —.4900| —.2029| —.0750| —.0169 
2 1 2 1404 .0435 | 1142 0764 0233 
2 | 3 | 1.2017; 1.1233 |  .7706 4152 1206 
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TABLE 12.—Continued 


| 
s a | bre d — 

| | | 10 20 30 .40 

2 | 4 | Oo .2268 1397 .0671 .0177 
Bed 7331 | —.4741 | —.2286| —.0958 .0233 

| a | 2 | 2 | 1.0107} —.5235]} —.1555| —.0361 | —.0042 
1 | 1 | 3 | ~ .2959 4340 .3987 2396 .0737 
1 | 0 | 4 | 1.7386} 1.5367) 1.1031 6366 | 
0 5 0 | .4301 .1278 .0366 
0 4 1 —.5931 | —.3728| —.1905| —.0879 .0228 
0 3 | -1.3547| —.7977| -—.3166| —.1123 .0244 
0 2 3 — .6897 | —.2365 .0540 .0851 _.0334 
0 1 4 7420 .7200 1445 
0 0 5 2.1855 | 1.9507 | 1.4400 .8717 .2972 

6 6 0 0 1932 1451 .0728 0289 .0066 

5 1 0 —.1561 | —.0971| —.0364| —.0117 .0023 
5 oj; 1 5165 4135 2327 1002 .0240 
4 2 | O — .0746 | —.0633 | —.0368 | —.0160 .0039 
4 — .3875 | —.1738 | —.0356 | —.0031 .0008 
4 0 2 9559 8084 5163 .2536 .0671 
3 3 0 0747 .0545 0240 .0078 .0015 
3 2 1 —.8201 | —.4867| —.1923| —.0681 .0148 
3 1 2 — .0331 .1330 .1701 1034 .0305 
3 | 0 3 1.4023 | 1.2196 8383 4535 1322 
2 | 4 0 .2296 1123 .0518 .0132 
2 | 3 1 —.7718| —.4999| —.2360| —.0964 0231 
2 | 2 2 — .9364 | —.4615| —.1163 | —.0165 .0012 
si 4 3 4062 5295 .4636 2744 .0838 
2 | 0 4 1.8492 | 1.6332) 1.1720 6778 | 
1 s 0 3855} .3257| | | ~—.0308 
1 4 1 — .6342 | —.4049, —.2072| —.0942 | .0242 
1 | 3 | 2 | 1.3672} —.7915| —.3002| —.1008| —.0208 
1 | 2 | 3 —.5826| —.1466| .1117 1147} 

aj] a] 4 8525| .9408| .7880| 4827, 1571 

| 

| 5 2.2961 | 2.0472 | 1.5093 9143 3129 

| 6 | O |  .5418 4651 | —_.3200 .1776 0536 
| 14 | —.2888) —.1441 | —.0693 0188 
| 4 | 2 | -1.3191) —.8168 —.3702 | —.1488 | —.0353 

| o | 3 —.7448 | —.1870 —.0182 0070 

O | 2 | 4 | —.1435 -2505| 4115} .2985 1043 

1 5 1.2994} 1.3544 1.1224} 7109 2465 

0 0 6 | 2.7430) 2.4612) 1.8476) 1. 
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ooooco 


Cor 


w an or or nn o Cor 


d 
| 05 10 .20 30 40 
0 | 2684 .2032 .1028 .0408 | .0093 
0 | —.1703| —.0981| —.0319| —.0087| —.0015 
1 | .6247| .2884| .0302 
0 | —.1162 | —.0939 | —.0504 .0206 | —.0048 
1 | —.3043 | —.1068 0029} 0141} .0051 
| 
2 | 1.0663 9041 .5814 .2876 | .0764 
0 | 0306 .0175 .0014 .0025 | —.0011 
1 | —.8237| —.4752 |} —.1774 0594 | —.0124 
0750 .1318 | .0380 
3 | 1.5129 | 1.3160 .9064 .4925| .1442 
| 
0 | 1852 .1497 .0855 .0374 | 
1 | —.8095| —.5233| —.2406 .0954 | —.0223 
2 | —.8534| —.3925| —.0732|  .0048| .0070 
3 | 5166 6252 -5293 | 3101 .0942 
4 | 1.9597 | 1.7297 | 1.2410 | 7193 2296 
| | | 
0 | 3409 | 2866 1838 | .0925 | 
1 | —.6751| —.4365| —.2225| —.0994|} —.0252 
| -1.3708 | —.7769| —.2793 | —.0875| —.0167 
| —.4745| —.0551 1712 | 1455 | .0508 
4 9631 | 1.0372 8563 | 5224 | 1700 
| | | 
s | 2.4067| 2.1437! 1.5786| .9871| —.3287 
0 .4970 | 4254 2896 | 1581 | 
1 — .5303 | —.3219| —.1642 0790 | —.0213 
2 —1.3565 | —.8385 | —.3696 | —-1432 | — .0331 
| —1.4009 | ~—.6749 | —.1409 | -0063 | 0142 
4 —.0331 | 4777 | | 1158 
5 1.4100 | 1.4509} 1.1914| .7528| .2614 
6 2.8536 | 2.5577| 1.9170} 1.2003}  .4384 
0 6537 | —.5660 .4004| =.2315 | ~—-.0732 
1 — .3846 | —.2024| —.0886| —.0413 | — .0112 
2 | —1.2229| —.7576 | —.3720| —.1660| —.0422 
3 | —1.9107 | —1.0674 | —.3649 .1010 | —.0153 
4 | —1.0240| —.3345 1158 | .1639 .0677 
5 | .4134| .7584| .8062| .5536| .1985 
6 1.8569 | 1.8649} 1.5291 9923 | 
7 3.3005 | 2.9718 | 2.2557 | 4460 | 5559 
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0 0 | z000°—| 000° — | s000° — | | 
0 | 1000°—| z000" — | 000° — | 1100° — | tl 
0 1000" — | — | o100° — | 9100° — 
0 | 7000'—|s000°—/|¢100°—|1z00°—| 0 0 0 0 0 0 0 0 0 0 ZI 
0 | z000°—| 0 0 0 | 1000°— | 1000°— 0 0 0 0 
1000°—| 000° 0100°—| 0 0 | 1000°—| 1000°— | 1000" — 0 0 0 |1000°'— |1000°'— o1 
© 1000" so00"— | #100"— | 0 0 | 1000°—| zo00"— | e000" — 0 0 |1000'— |zo00'— 6 
 7000°—| 2000°—|0z00°—| 2900'—| 0 1000°—| zo00" —| #000" — | 9000° — 0 0 | 1000°— | #000'— |9000°- | 8 
S  2000°—| 0100" —| 1900°—| 2800"—| 0 2000" —| s000° —| 6000" — | 1100" 0 |1000'— | ¢000'— |so00'— | 
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not affect the validity of a sequential test, but his proof of the optimum character 
of the sequential test does not cover dependent observations. I suspect, but have 
not been able to prove, that the sequential probability ratio test is optimum for 
this class of dependence also. 

The ease and exactness with which probability ratio scores may be combined is 
particularly important when the data are of mixed known and unknown phase, 
since the alternative u score theory provides only a rough approximation in small 
samples (Finney, 1943; Smith, 1953). This is a critical point, not only for human 
pedigrees, but especially in laboratory vertebrates where linkage studies are of 
secondary interest and the material on any particular pair of loci is usually hetero- 
geneous and small. 


16. INSTRUCTIONS FOR ANALYSIS 


Although the simplicity of the sequential probability ratio test allows the in- 
vestigator to modify his methods to fit particular situations, it may be useful to 
set down here instructions for the routine case of unrelated families, tested parents, 
known parental genotypes, and unknown phase. 

Step 1. Define the method of selection. This comprehends both ascertainment of 
families and rejection of some kinds of ascertained families. Usually, families with 
untested parents or of doubtful mating type will be rejected; otherwise, cf. §§12-13. 
For each factor selection may be complete, truncate, or arbitrary (§7). With respect 
to the two factors in a linkage test, there are three important methods of selection: 

(i) Complete selection of one or both factors. 

(ii) Truncate selection of both factors. 

(iii) Arbitrary selection of one factor (G), truncate selection of the other (T). 

Step 2. Choose the alternative hypothesis (cf. §15). If the amount of data that 


can be obtained with “reasonable” effort is likely to be small, choose 6; = .05 or 
.10; if a moderately large amount of data is hoped for, choose @, = .20 or .30; if an 
extraordinarily large amount is anticipated, take 6, = .40. Usually, log B = —2 


and log A = 3 are appropriate choices for the other parameters of the test. 

Step 3. Classify the mating type of each family according to tables 4-8, and 
distribute the children among classes a, b, c, d, --- . In these tables, G;, G2 and 
Ti , Tz denote factors without dominance or rare “‘dominants”, while G, g and T, t 
are factors showing simple dominant-recessive relationships. 

Step 4. Determine the score for each family from tables 10-18, or compute directly, 
using common logarithms. The following outline may be helpful in performing the 
above steps. 


Classification of matings, methods of selection, and scores (2) 


I. Double backcross, and single backcross with no dominance in the inter- 


«Cross Tactor. 
(i) Complete selection of either factor 21 
(ii) Truncate selection of both factors at C1 


(ili) Arbitrary-truncate selection a+ 
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TABLE 18.—LoOpD SCORES FOR INDIVIDUAL PROGENY WHEN THE PARENTAL PHASE IS KNOWN 


ily; 1) 
fy) 05 10 .20 | 30 40 
26; — 1.0000 — .6990 | — .3979 | — .2218 — .0969 
2(1 — | .2553 | . 2041 .1461 .0792 
2(2 — @)/3 .1139 .1027 | .0792 | .0544 .0280 
2(1 + 6:)/3 — .1549 | —.1347 | —.0969 | —.0621 — .0300 
4(3 — 20; + 6:)/9 .1106 | 0965 | .0694 .0440 0207 
4(2 + 6 — 61)/9 —.0410 | -—.0320 | —.0177 | —.0078 | —.0019 
4(1 — 0, + 6;)/3 . 1038 .0840 | .0492 .0226 -0058 
61)/9 —.0506 | —.0490 | —.0426 | —.0320 | ~.0177 
2(1 + 26, — 261) /3 ~—.1367 | —.1042 | —.0555 | —.0238 | —.0058 
2(1 — 20, + | .2148 | 1335 | .0645 .0170 
II. Single backcross with dominance in the intercross factor. 
(i) Complete selection of either factor Zo 
(ii) Truncate selection of both factors Zo + C2 


(iii) Arbitrary selection of intercross factor, truncate selection of back- zz + e& 
cross factor 
(iv) Arbitrary selection of backcross factor, truncate selection of inter- zz + dz 
cross factor 
IIL. Double intercross with dominance in both factors 


(i) Complete selection of either factor Z3 
(ii) Truncate selection of both factors Z3 + Ca 
(iii) Arbitrary-truncate selection Z3 + €3 
IV. Double intercross with dominance in one factor 
(i) Complete selection of either factor Z4 


(ii) Arbitrary selection of factor with no dominance, truncate selection zs + es 
of dominant factor 

V. Double intercross with no dominance in either factor 

(i) Complete selection of either factor ; Zs 

Step 5. Accumulate the family scores (z). If }>z < log B, conclude that the 
frequency of recombination @ is significantly greater than 6, on the assumptions of 
§1. If 5° z > log A, conclude that @ is significantly less than 1/2. Review the data 
and assumptions before deciding that true linkage is present. If log B < oz < 
log A, suspend judgment about linkage until further data lead to a decision. More 
data can also be used to estimate 6, after linkage has been detected, or to make a 
further test for linkage in the range 0, < @ < 1/2, if that seems advisable. 

The following examples illustrate the scoring procedure. 

Case 1. A mating of type GT X gt gives 2GT, 2Gt, and 1gt progeny. This is a double 
backcross (mating 1) with s = 5, a + d = 3. The score for complete selection is 
z (table 10). For truncate selection of both factors, add the correction factor ¢: 
(table 13), and for truncate selection of the T factor but arbitrary selection of the 
G factor (which shows 4G: 1g) add e; with s; = 4, s: = 1 (table 14). For 4 = .20, 
we find = —.3876, + = —.3895, and z; + = —.3829. 
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Case 2, A mating of type GT X Gt gives 5GT, 2gT, 3Gt, and 1gt progeny. This is 
a single backcross (mating 9) withs = 11,a = 5,b = 2,c = 3, andd = 1. Families 
of this size are not given in table 11, but the score may quickly be obtained by 
factoring the expression for zz which is 


| 


10 
— (1 + — + (1 (1 — — 


3 log [2(2 — 6,)/3] + 3 log [2(1 + 0) /3] + log 26, + log 2(1 — 4) 


The first four terms correspond to progeny of known parental phase (table 18), 
the last term to a single backcross family with s = 3,a = 2,b=1,c=d=0. 
For 6; = .20, we find 


z. = 3(.0792) + 3(—.0969) + (—.3979) + .2041 + (—.0969) = —.3438. 
The corresponding scores for incomplete selection are z2 + c. = —.3438 and 
Za + @ = —.3439. Here, as is usual in large families, the corrections for incomplete 


selection are negligible. 
17, SUMMARY 


The sequential probability ratio test for linkage detection in man is simple, exact 
and efficient. The basic assumptions of the linkage test are discussed, and criteria 
are developed for the choice of parameters in the sequential test. For the case of 
double backcross sib-pairs, the sequential tests considered here require less than 
1/3 as many observations for a given risk of error as the Fisher-Finney u score 
method and about 1/5 as many observations as the Haldane-Smith nonsequential 
probability ratio test. Formulae for “lod” scores are given for a variety of mating 
types and methods of selection, and the research worker should have no difficulty 
extending the formulae to novel cases as they arise. The optimum property of the 
sequential probability ratio test holds for mixed data, the combination of which is 
easy and exact. Examples and tables of scores are given for the most important 
mating types. 


The work for this paper was done under the direction of Dr. J. F. Crow, to whom the author is 
indebted for many stimulating discussions and constant encouragement. Drs. E. R. Immel, W. J. 
Schull, and C. A. B. Smith read the preliminary manuscript and offered helpful comments. Thanks 
are also due to the Numerical Analysis Laboratory of the University of Wisconsin, and especially 
to Mr. William Graebel, for assistance in computing the tables of linkage scores. 


REFERENCES 


BarLey, N. T. J. 1951. A classification of methods of ascertainment and analysis in estimating the 
frequencies of recessives in man. Ann. Eugen. 16: 223-225. 

BERNSTEIN, F. 1931. Zur Grundlegung der Chromosomentheorie der Vererbung beim Menschen mit 
besondere Beriicksichtung der Blutgruppen. Z. indukt. Abstamm. u. VererbLehre 57: 113-138. 

BrincEs, C. B., AND K. S. BREHME 1944, The mutants of Drosophila melanogaster. Carnegie Inst. 
Wash. Publ. 552. 


i 


318 NEWTON E. MORTON 


Bross, I. 1952. Sequential medical plans. Biomeirics 8: 188-205. 

Carter, T. C. 1955, The estimation of total genetical map lengths from linkage test data. J. Gener. 
53: 21-28. 

Crew, F. A. E., anp P. Cu. Kotter 1932. The sex incidence of chiasma frequency and genetical 
crossing-over in the mouse. J. Genel. 26: 359-384. 

Finney, D. J. 1940. The detection of linkage. Ann. Eugen. 10: 171-214. 

Finney, D. J. 1941a. The detection of linkage. II: Further mating types; scoring of Boyd’s data. 
Ann. Eugen. 11: 10-30. 

Finney, D. J. 1941b. The detection of linkage. III: Incomplete parental testing. Ann. Eugen. 11: 
115-135. 

Finney, D. J. 1942. The detection of linkage. VI: The loss of information from incompleteness of 
parental testing. Ann. Eugen. 11: 233-242. 

Finney, D. J. 1943. The detection of linkage. VII: Combination of data from matings of known and 
unknown phase. Ann. Eugen. 12: 31-43. 

FisHer, R. A. 1935. The detection of linkage with “dominant” abnormalities. Ann. Eugen. 6: 
187-201. 

Hapane, J. B. S. 1934. Methods for the detection of autosomal linkage in man. Ann. Eugen. 6: 
26-65. 

Hapane, J. B. S. 1946. The cumulants of the distribution of Fisher’s “uy,” and “us,” scores used in 
the detection and estimation of linkage in man. Ann. Eugen. 13: 122-134. 

Ha.pant, J. B. S., anp C. A. B. Smirn 1947. A new estimate of the linkage between the genes for 
colour-blindness and haemophilia in man. Ann. Eugen. 14: 10-31. 

Harris, H. 1948. On sex limitation in human genetics. Eugen. Rev. 40: 70-76. 

Hocsen, L. 1934. The detection of linkage in human families. Proc. Roy. Soc. B 114: 340-363. 

Kosams!, D. D. 1944. The estimation of map distances from recombination values. Ann. Eugen. 12: 
172-175. 

Monk, J. 1954. A study of linkage in man. Op. Dom. Biol. Hered. Hum. Univ. Hafn. 33: 1-119. 

NEEL, J. V. 1949. The detection of the genetic carriers of hereditary disease. Amer. J. Hum. Genel. 
1: 19-36. 

PENROSE, L. S. 1953. The general purpose sib-pair linkage test. Ann. Eugen. 18: 120-124. 

Ruwoapes, M. M. 1950. Meiosis in maize. J. Hered. 41: 59-70. 

SuizynskI, B. M. 1949. A preliminary pachytene chromosome map of the house mouse. J. Genet. 49: 
242-245. 


Sairn, C. A. B. 1953. The detection of linkage in human genetics. J. Roy. Stat. Soc. B 15: 153-192. 

Watp, A. 1947. Sequential Analysis. New York: Wiley. 

Wa tp, A., and J. Wotrowrtz. 1948. Optimum character of the sequential probability ratio test. 
Ann. Math. Stat. 19: 326-339. 


D) 


THI 
troy 
to | 
mu 
tor: 
ext 
hoc 
ind 
tra 
aut 
Sor 
pel 
ple 
ani 
col 
thi 
lin 
sig 
lit 
to] 
G 
Ti 
ne 
le: 
ce 
he 
it! 
er 
d 


BOOK REVIEWS 


Dystrophia Musculorum Progressiva: Eine genetische and klinische 
Untersuchung der Muskeldystrophien 


By Pror. P. E. Becker, Tuttlingen. Georg Thieme Verlag, Stuttgart, 1953. 
Pp. 311, 101 figures. DM 28.50. 


THIs monograph reports in detail a genetic and clinical investigation of progressive muscular dys- 
trophy in the province of Baden, Germany. The material was collected during the period from 1938 
to 1940, and includes information on 259 cases, 162 of these being alive at the time of investigation. 

Becker’s results confirm those of previous investigators in finding two major types of progressive 
muscular dystrophy that are genetically distinct. Nine kindreds are described in which the pectoral 
girdle type is transmitted as a dominant characteristic. Eleven apparently isolated cases of the pec- 
toral girdle type were found, some probably representing new mutations, but the writer suggests that 
extrinsic factors, perhaps trauma, may be of importance in some instances. The pelvic girdle or child- 
hood type of muscular dystrophy was found in 64 sibships, 37 of these containing only one affected 
individual. Evidence is presented showing that this group contains families in which the disease is 
transmitted as a sex-linked recessive characteristic and families in which those affected are homo- 
zygous for an autosomal recessive gene. The investigator is unable to distinguish the sex-linked and 
autosomal types of childhood muscular dystrophy clinically. The prevalence of the two types in the 
South Baden area as of July, 1939, is estimated at 0.06 per thousand for the pectoral type and 0.05 
per thousand for the pelvic type. 

Dr. Becker presents his clinical findings and pedigree information in detail. Each kindred is com- 
pletely described, with pedigree drawings, this section occupying 145 pages of the text. The data are 
analyzed extensively from both the clinical and genetic viewpoints. Data are included on other disease 
conditions found in the families studied. The extensive presentation of all available data will make 
this monograph quite valuable to those doing similar research in other areas. Intensive surveys of 
limited geographic areas for specific diseases or groups of diseases of genetic origin are of considerable 
significance to studies of population genetics. This monograph is a very worthwhile addition to the 
literature in this field. An index is not provided, but the table of contents is quite detailed as to the 
topics discussed. The bibliography is extensive, occupying 13 pages. The book is attractively printed 
and bound, and the illustrations, tables and charts are adequate. 

C. Nash Hernpon, M.D. 
Bowman Gray School of Medicine 


Genetic Homeostasis 


— 


. MicHAEL LERNER. New York: John Wiley & Sons, Inc., 1954, pp. 134. 


2 


wn 


Tue history of genetics, as of other sciences, is characterized by the sporadic appearance of cases of 
new phenomena. At first the results are uninterpretable; then aspects of similarity emerge ultimately 
leading to a new synthesis and a fresh impetus for further study. 

Perhaps this sequence is nowhere better illustrated than in the developments which led to the con- 
cept of genetic homeostasis as described in this book by I. Michael Lerner. 

It is necessary at the outset to distinguish between various kinds of homeostasis. P/ysiological 
homeostasis is “the totality of steady states maintained in an organism through the co-ordination of 
its complex physiological processes.” Developmental homeostasis refers to the stabilizing processes 
embryonic development which tend to eliminate extremes. Similarly, psychological homeostasis and 
ecological homeostasis, may be defined. In particular, genetic homeostasis is “the property of (a men- 
delian) population to equilibrate its genetic composition and to resist sudden changes.” Genetic 
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homeostasis, as Lerner points out, bridges the gap between individual physiological homeostasis on 
the one hand and group ecological homeostasis on the other. 

The definition of genetic homeostasis raises two questions: what is the evidence for existence of 
this property in mendelian populations and what mechanisms achieve it. Lerner’s book is devoted in 
about equal measures to discussing these questions. 

The evidence is chiefly of three types. First, several selection experiments have resulted in cessation 
or at Jeast deceleration of progress without a corresponding reduction in genetic variance. It appears 
that mendelian populations harbor sufficient genetic variability to permit them to respond to artificial 
selection, but that after a certain limit is reached natural selection opposes the artificial selection, 
thereby preventing further progress even in the demonstrable presence of a supply of residual genetic 
variability. Second, numerous cases of phenotypic balance in sparrows, lizards, mice, rabbits, sheep, 
rats, and other forms suggest that phenotypic deviants are sacrificed to natural selection and that 
selection favors those organisms which exhibit mean or near mean values of all quantitatively varying 
traits. 

The third type of evidence consists of several cases in Drosophila, poultry, mice, and other forms 
of the environmental component of variance being larger in homozygotes, smaller in heterozygotes. 

The interpretation of these phenomena in genetic, biochemical, and developmental terms pro- 
vides fascinating opportunities for speculation. Lerner explores several concepts and models of gene 
action and interrelationship which, if not final, will at least open the way for further experimentation 
and theory. 

The student of human heredity will be especially interested in the concept of genetic homeostasis, 
since mankind propagates itself by means of a breeding structure which insures the maintenance of 
maximal or near maximal amounts of heterozygosity. 

Eart L. GREEN 
Atomic Energy Commission 


Man's Capacity to Reproduce the Demography of a Unique Population 


By Josep W. Eaton and AvBert J. Mayer, Glencoe, Illinois: The Free 
Press, 1954, Pp. 59, $2.00. 


A HUMAN geneticist might well dream of a population like the Hutterites. They are a vigorous and 
interesting people. A lot of them are living within short distances of one another. Their families have 
a median of 9 children. Their non-genetic environment is relatively uniform as Western populations 
go. They have comparatively excellent knowledge about details of their own family histories. They 
have a favorable attitude toward scientific and medical investigation. The only disappointment a 
dreaming medical geneticist might encounter is that they are unusually healthy. 

This Free Press edition is a reprint of a paper which appeared in Human Biology (Vol. 25, no. 3, 
September, 1953, pp. 206-264). The original title is “The Social Biology of Very High Fertility among 
the Hutterites.”” The subtitles are identical in the two editions. 

The Hutterites are an anabaptist sect living (as of 1950) in 93 self-contained colonies in the 
Dakotas, Montana, Alberta, Saskatchewan and Manitoba. The sect originated in Switzerland and 
Bohemia in 1528. During the 17th and 18th centuries they suffered severe persecution at the hands 
of both Catholics and Protestants. In 1762 some members of the sect found sanctuary in Crimea. 
Between 1874 and 1877 nearly all of the faithful Hutterites, fearful of renewed persecution, moved 
from Russia to South Dakota. Other Hutterite colonies with slightly different backgrounds are living 
today in England and Paraguay. 

Between 1880 and 1950 the Hutterite population in North America increased over 19 times, from 
443 to 8,542. New colonies are formed at a rate which keeps the typical colony size around 100 indi- 
viduals. 70°% of the 443 colony Hutterites listed in the 1880 census had only 5 patronyms and 10 
additional surnames account for the other 30%. 

The demographic data analyzed by Eaton and Mayer includes records collected by them and their 
collaborators on 6,796 individuals living in 71 colonies. The data represent a sample of 80% of the 
total Hutterite population living in 1950 and include information on birthdate, birth order including 
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stillbirths, sex, marital status, occupation and religious leadership, death dates for the decade 1940- 
1950, and the place of residence of a small number of Hutterites living outside of a colony. 

The reproductive performance of the Hutterites is unique among modern Western populations. 
The sex ratio is 101¢°:100 9. The population is very young—50.6% are under 15 years of age and 
only 2.3% are over 65. More males than females reach ages over 40. The crude birth rate is 45.9 per 
1,000 population. The fertility ratio, that is, the number of children under 5 years per 100 women of 
ages 15-49, is 96.3. The age specific fertility rate is 391.1 per 1,000 women in the age group 30-34. The 
166 women who were between 45 and 54 in 1950 had a mean of 10.6 live births. Nearly everyone 
marries (only after baptism at the age of 19) and only 3.4% of marriages are childless. All of this 
reflects the extremely high Hutterite population fecundity: i out of each 13 ovulation cycles termi- 
nates in a live birth. This estimate of fecundity is minimum because it is not corrected for illness, 
miscarriages, nor normal separation of consorts. 

The crude death rate (1941-1950) was 4.4 per 1,000 population. At the current rate of increase 
(4.13% per year), the population will double in number in about 16 years. 

The Hutterites are an ideal population for study of the interplay of genetic, psychological, social 
and cultural variables associated with high fertility. Eaton and Mayer plan such a study for the near 
future. We wish them and their Hutterite collaborators all success in this work. 

J. N. SPUHLER 
Institute of Human Biology 
University of Michigan 


Human Heredity 


By James V. NEEL and WILu1am J. ScHULL, Chicago University Press, 1954, 
Pp. 361. 


HuMAN genetics has come a long way since the days when efforts were devoted mainly to the collection 
of pedigrees of rare conditions and to the fitting (by hook or by crook) of the familial distributions 
of human traits to Mendelian expectations. Much of the progress has been due to the development 
of mathematical techniques to deal with the special problems that arise in the analysis of human 
family data. Until now most of these techniques have remained scattered through a variety of genetic, 
biometric and medical journals. The present book draws together a number of the more useful 
statistical methods and discusses their application to current problems in human heredity. Priscilla 
and Vicki (to whom the book is dedicated) should be proud of the result. It is not intended to be 
an exhaustive review of the field, but presents “‘some of the landmarks of past work in human hered- 
ity and some of the signposts for future development’’. The emphasis is on “the methodology ... 
far more than the established facts” of human genetics. This is not, therefore, a book for beginning 
students in genetics or for medical students, but will be an invaluable aid to the graduate student 
and others actively working in human genetics. 

After an introductory chapter describing the advantages, as well as the disadvantages of man as 
a subject for genetic study, the authors deal briefly but clearly with the physical basis of heredity. 
The inherited variations of the red blood cell are used as a text to present the concept of the genes, 
the specificity of their control of biochemical processes, and the immense number of different combi- 
nations in which they may occur. A chapter on “Nature and Nurture” emphasizes the complexity 
of gene-environment and gene-gene interactions and discusses the lines of evidence that may con- 
tribute information on the problem. 

The chapters on dominant and recessive inheritance are original in considering the inheritance of 
rare and common genes separately, and including sex-linked and partially sex-linked dominant and 
recessive genes along with their autosomal counterparts, rather than treating them in a separate 
chapter. This section also discusses penetrance, pleiotropy, the Hardy-Weinberg law, the relation of 
age of onset of disease to more of inheritance, and the relation of consanguinity to recessive inheri- 
tance. Then there is a chapter on “Genes Neither Dominant Nor Recessive” which includes a good 
discussion of genetic “carriers” and a table summarising the diseases that may show a carrier state. 

A discussion of quantitative inheritance includes a critical evaluation of the role of correlation in 
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estimating the contributions of heredity and environment to a given quantitative character, using 
height and intelligence as type examples. This is followed by a chapter on linkage in which the authors 
present Penrose’s sib-pair methods of linkage detection (the Fisher-Finney method is considered too 
difficult, mathematically, for most readers) and a discussion of the unlikelihood that common genetic 
markers linked to pathological genes will be of any immediate practical use for genetic prognosis. 
In the next chapter the estimation of mutation rates and the problem of induced mutation in man 
are discussed clearly and critically. 

After a section on physiological genetics dealing with a number of inherited metabolic defects in 
man, the reader reaches the meat of the book, “The Estimation of Genetic Parameters and Tests of 
Genetic Hypotheses”. The use of the maximum likelihood method of estimation is demonstrated for 
a number of genetic situations (two and three autosomal alleles without and with dominance, sex- 
linked alleles, two pairs of alleles), and the use of x? as a test of goodness of fit is illustrated. Here 
the mathematics becomes quite heavy, for which the authors make no apology, believing that “. .. 
a knowledge of certain branches of mathematics is no less essential to the serious student of human 
heredity than to the astronomer ...”. The reviewer has not attempted to check the formulae or 
calculations. The following chapter deals efficiently with the knotty problem of ascertainment, though 
the difficulties involved are not dealt with as thoroughly as in a recent article of Schull’s, which 
the authors have modestly omitted even from the bibliography. After some more advanced algebra 
the chapter concludes with a warning that it is no use applying fancy statistics to data that are in- 
adequate, either through improper collection or insufficient understanding of the biological situation. 

The chapter on population genetics deals with the frequencies of genes of universal distribution 
(e.g., blood groups), and of genes of restricted distribution (such as those for Thalassemia and the 
sickling phenomenon), and discusses the factors influencing these frequencies. Due note is taken of 
the possible errors and biases in estimating these factors, but the authors are optimistic about the 
potential contributions of research in this field to our understanding of anthropological problems. 
On the other hand, in the following chapter, they are pessimistic about the value of twin studies as a 
means of appraising nature-nurture interaction. 

The final section of the book deals with the practical aspects of human genetics as applied in the 
fields of epidemiology, counselling, forensic medicine and eugenics. The authors have presented a 
cautious critical and constructive appraisal of the contributions of genetics in these fields, and this 
could be read with profit by doctors, social workers and others who have to deal with human families 
and their genetic problems, or who are otherwise concerned about the future of the human species. 
The book ends on a characteristic note of caution—the suggestion that “the effort which would be 
expended on a eugenics program might better go into efforts to explore the many gaps in our present 
fragmentary information.” 

F. CLaRKE FRASER 
McGill University 


The Unleashing of Evolutionary Thought 


By Oscar RippLe. New York: Vantage Press. 1954, Pp. 414, $4.50. 


Tus essay was born of a firm conviction that the attitudes and goals of society everywhere would 
be vastly matured by general acceptance of the concept of evolution. Since Dr. Riddle realizes that 
this conviction is shared by most scientists the book is primarily an explanation of why evolution is 
not more generally accepted. Nor is the book directed to scientists alone. It is written for “your neigh- 
bors and mine” who “are wholly unprepared to give thought to the things that would flow from a 
widely accepted view of the natural origin of man, of his biological and social nature, of the animal 
and social sources of morality, and of a world rid of the supernatural.” For this reason Part I entitled 
“What Evolutionary Thought Is” has been added. The various chapters in this section cover such 
topics as, “The Problem of Creation”, ‘Evolution and Ethics”, “The Biological Inequality of Man”. 
It may be seen that these are the everyday ideas which are most likely to be modified by an under- 
standing of evolution in its broadest sense. For example, in the chapter entitled “Social Inheritance” 
Riddle lists three present day dangers. First, modern technological civilization fails to consider 
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sufficiently the biological exigencies of man. Second, organized religions hinder understanding of 
man’s biological origin. Thus he is prevented from making sound plans for the future of society. 
Third, overpopulation brings into even sharper focus the need for application of eugenic measures. 

It is curious that Dr. Riddle lists these dangers in this order for it has already become abundantly 
clear that the second is the main thesis of this book. Part IT, the longest and most detailed, is called 
“Reins Held by Religion.” Here is gathered an imposing mass of evidence to show that the organized 
religions hinder not only the dissemination of evolutionary ideas but also receptivity to these con- 
cepts. Riddle’s most interesting contribution in Part II is his discussion of the manner in which evo- 
lution is taught in our secondary schools and universities. He is on firm ground here for he was chair- 
man of a committee of the Union of American Biological Societies which studied the teaching of 
evolution in high schools of the United States. There is no gainsaying that the information obtained by 
this committee indicates that religious reasons, above all others, are at the basis of the gingerly way 
in which evolution is approached in our schools and colleges. 

The third part “Opinion and Outlook” reaches strong, all-embracing conclusions. Dr. Riddle 
spares no religion here or abroad in fixing responsibility for the widespread ignorance of evolution and 
its implications. 

Many, if not most, of the ideas herein have been published before. Many readers may think that 
the emphasis on religion is misplaced; that human nature resists evolutionary ideas because they are 
uncomfortable or disquieting for other reasons. Some will feel that the cogency of Dr. Riddle’s argu- 
ment is lost in a somewhat leaden style. These criticisms however, do not detract from the overall 
worthiness of his premises. 

This book is not a pleasant one because Dr. Riddle does not spend much time being merely opti- 
mistic. He presents very little evidence that indicates any progress in the “pressing conflict” between 
belief in the supernatural and belief in what science can tell us about ourselves. Nevertheless, de- 
spite the mass of gloomy documentation one never entirely loses sight of Dr. Riddle’s fundamental 
faith in man’s “good purposes” and his “earnest doubts.” 

BEAL HypE 
University of Oklahoma 


